Re: `cat /dev/null > solr-8983-console.log` frees host's memory

2015-10-21 Thread Emir Arnautovic

Hi Eric,
As Shawn explained, memory is freed because it was used to cache portion 
of log file.


Since you are already with Sematext, I guess you are aware, but doesn't 
hurt to remind you that we also have Logsene that you can use to manage 
your logs: http://sematext.com/logsene/index.html


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 20.10.2015 17:42, Shawn Heisey wrote:

On 10/20/2015 9:19 AM, Eric Torti wrote:

I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux
64-bit box and decided to `cat /dev/null > solr-8983-console.log` to
free space.

The weird thing is that when I checked Sematext I noticed the OS had
freed a lot of memory at the same exact instant I did that.

On that memory graph, the legend doesn't indicate which of the graph
colors represent each of the four usage types at the top -- they all
have blue checkboxes, so I can't tell for sure what changed.

If the number that dropped is "cached" (which I think is likely) then
everything is working exactly as it should.  The OS had simply cached a
large chunk of the logfile, exactly as it is designed to do, and once
the file was deleted, it stopped reserving that memory and made it
available.

https://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn



RE: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-21 Thread Davis, Daniel (NIH/NLM) [C]
Susheel, 

Our puppet stuff is very close to our infrastructure, using specific Netapp 
volumes and such, and assuming some files come from NFS.
It is also personally embarrassing to me that we still use NIS - doh!

-Original Message-
From: Susheel Kumar [mailto:susheel2...@gmail.com] 
Sent: Tuesday, October 20, 2015 8:34 PM
To: solr-user@lucene.apache.org
Subject: Re: DevOps question : auto deployment/setup of Solr & Zookeeper on 
medium-large clusters

Thanks, Davis, Jeff.

We are not using AWS.  Is there any scripts/framework already developed using 
puppet available?

On Tue, Oct 20, 2015 at 7:59 PM, Jeff Wartes  wrote:

>
> If you’re using AWS, there’s this:
> https://github.com/LucidWorks/solr-scale-tk
> If you’re using chef, there’s this:
> https://github.com/vkhatri/chef-solrcloud
>
> (There are several other chef cookbooks for Solr out there, but this 
> is the only one I’m aware of that supports Solr 5.3.)
>
> For ZK, I’m less familiar, but if you’re using chef there’s this:
> https://github.com/SimpleFinance/chef-zookeeper
> And this might be handy to know about too:
> https://github.com/Netflix/exhibitor/wiki
>
>
> On 10/20/15, 6:37 AM, "Davis, Daniel (NIH/NLM) [C]" 
> 
> wrote:
>
> >Waste of money in my opinion.   I would point you towards other tools -
> >bash scripts and free configuration managers such as puppet, chef, salt,
> >or ansible.Depending on what development you are doing, you may want
> >a continuous integration environment.   For a small company starting out,
> >using a free CI, maybe SaaS, is a good choice.   A professional version
> >such as Bamboo, TeamCity, Jenkins are almost essential in a large 
> >enterprise if you are doing diverse builds.
> >
> >When you create a VM, you can generally specify a script to run after the
> >VM is mostly created.   There is a protocol (PXE Boot) that enables this
> >- a PXE server listens and hears that a new server with such-and-such
> >Ethernet Address is starting.   The PXE server makes it boot like a
> >CD-ROM/DVD install, booting from installation media on the network and
> >installing.Once that install is down, a custom script may be invoked.
> >  This script is typically a bash script, because you may not be able to
> >count on too much else being installed.   However, python/perl are also
> >reasonable choices - just be careful that the modules/libraries you are
> >using for the script are present.The same PXE protocol is used in
> >large on-premises installations (vCenter) and in the cloud 
> >(AWS/Digital Ocean).  We don't care about the PXE server - the point 
> >is that you can generally run a bash script after your install.
> >
> >The bash script can bootstrap other services such as puppet, chef, or 
> >salt, and/or setup keys so that push configuration management tools such
> >as ansible can reach the server.   The bash script may even be smart
> >enough to do all of the setup you need, depending on what other servers
> >you need to configure.   Smart bash scripts are good for a small company,
> >but for large setups, I'd use puppet, chef, salt, and/or ansible.
> >
> >What I tend to do is to deploy things in such a way that puppet 
> >(because it is what we use here) can setup things so that a "solradm" 
> >account can setup everything else, and solr and zookeeper are running as a 
> >"solrapp"
> >user using puppet.Then, my continuous integration server, which is
> >Atlassian Bamboo (you can also use tools such as Jenkins, TeamCity, 
> >BuildBot), installs solr as "solradm" and sets it up to run as "solrapp".
> >
> >I am not a systems administrator, and I'm not really in "DevOps", my 
> >job is to be above all of that and do "systems architecture" which I 
> >am lucky still involves coding both in system administration and applications
> >development.   So, that's my 2 cents.
> >
> >Dan Davis, Systems/Applications Architect (Contractor), Office of 
> >Computer and Communications Systems, National Library of Medicine, 
> >NIH
> >
> >-Original Message-
> >From: Susheel Kumar [mailto:susheel2...@gmail.com]
> >Sent: Tuesday, October 20, 2015 9:19 AM
> >To: solr-user@lucene.apache.org
> >Subject: DevOps question : auto deployment/setup of Solr & Zookeeper 
> >on medium-large clusters
> >
> >Hello,
> >
> >Resending to see opinion from Dev-Ops perspective on the tools for 
> >installing/deployment of Solr & ZK on large no of machines and 
> >maintaining them. I have heard Bladelogic or HP OO (commercial tools) 
> >etc. being used.
> >Please share your experience or pros / cons of such tools.
> >
> >Thanks,
> >Susheel
> >
> >On Mon, Oct 19, 2015 at 3:32 PM, Susheel Kumar 
> >
> >wrote:
> >
> >> Hi,
> >>
> >> I am trying to find the best practises for setting up Solr on new 
> >> 20+ machines  & ZK (5+) and repeating same on other environments.  
> >> What's the best way to download, extract, setup Solr & ZK in an 
> >> automated way 

Re: LIX readability index calculation by solr

2015-10-21 Thread Walter Underwood
Can you reload all the content?

If so, I would calculate this in an update request processor and put the result 
in its own field.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 21, 2015, at 2:53 AM, Roland Szűcs  wrote:
> 
> Thank Toke your quick response. All your suggestions seem to be very good 
> idea. I found the capital letters also strange because of the names, places 
> so I will skip this part as I do not need an absolute measure just a ranked 
> order among my documents,
> 
> cheers,
> Roland
> 
> 
> 
> 2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen 
>  írta:
> 
>> Roland Szűcs  wrote:
>>> My use case is that I have to calculate the LIX readability index for my
>>> documents.
>> [...]
>>> *B* = Number of periods (defined by period, colon or capital first letter)
>> [...]
>>> Does anybody have idea how to get the number of "periods"?
>> 
>> As the positions does not matter, you could make a copyField containing only 
>> punctuation. And maybe extended with a replace filter so that you have dot, 
>> comma, color, bang, question ect. instead of .,:!?
>> 
>> The capital first letter seems a bit strange to me - what about names? But 
>> anyway, you could do it with a PatternReplaceCharFilter, matching on 
>> something like 
>> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
>> and replacing with 'capital' (the regexp above probably fails - it was just 
>> from memory).
>> 
>> - Toke Eskildsen



Re: Efficiency of integer storage/use

2015-10-21 Thread Upayavira
What I'd say is that there are *substantial* optimisations done already
when indexing terms, especially numerical ones, e.g. looking for common
divisors. Look out for a talk by Adrien Grand at Berlin Buzzwords
earlier this year for a taste of it.

I don't know how much of this kind of optimisation has been done on doc
values. I suspect not much yet (e.g. when committing a set of integer
values into a new segment, look to see how big these values are and set
the number of bits in the array accordingly).

If you want to make these sorts of optimisations, I'd suggest looking at
how docvalues are coded on disk, and see if you can make changes that
would benefit all users across the board.

Upayavira

On Wed, Oct 21, 2015, at 08:52 AM, Robert Krüger wrote:
> Thanks everyone, for your answers. I will probably make a simple
> parametric
> test pumping a solr index full of those integers with very limited range
> and then sorting by vector distances to see how the performance
> characteristics are.
> 
> On Sun, Oct 18, 2015 at 9:08 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
> 
> > Robert,
> > From what I know as inverted index as docvalues compress content much, even
> > stored fields compressed too. So, I think you have much chance to
> > experiment successfully. You might need tweak schema disabling storing
> > unnecessary info in the index.
> >
> > On Sat, Oct 17, 2015 at 1:15 AM, Robert Krüger 
> > wrote:
> >
> > > Thanks for the feedback.
> > >
> > > What I am trying to do is to "abuse" integers to store 8bit (or even
> > lower)
> > > values of metrics I use for content-based image/video search (such as
> > > statistical values regarding color distribution) and then implement
> > > similarity calculations based on formulas using vector distances. The
> > Index
> > > can become large (tens of millions of documents each with say 50-100
> > > integers  describing the image metrics). I am looking at using a part of
> > > those metrics for selecting a subset of images using range queries and
> > then
> > > more for sorting the result set by relevance.
> > >
> > > I was first looking at implementing those metrics as binary fields (see
> > > other posting) and then use a custom function for the distance
> > calculation
> > > but so far I got the impression that way is not supported really well by
> > > Solr. Base64-En/Decoding would kill performance and implementing a custom
> > > field type with all that is probably required for that to work properly
> > is
> > > currently beyond my Solr knowledge. Besides, using built-in Solr features
> > > makes it easier to finetune/experiment with different approaches,
> > because I
> > > can just play around with different queries and see what works best,
> > > without each time adjusting a custom function.
> > >
> > > I hope that provides a better picture of what I am trying to achieve.
> > >
> > > Best,
> > >
> > > Robert
> > >
> > > On Fri, Oct 16, 2015 at 4:50 PM, Erick Erickson  > >
> > > wrote:
> > >
> > > > Under the covers, Lucene stores ints in a packed format, so I'd just
> > > count
> > > > on that for a first pass.
> > > >
> > > > What is "a lot of integer values"? Hundreds of millions? Billions?
> > > > Trillions?
> > > >
> > > > Unless you give us some indication of scale, it's hard to say anything
> > > > helpful. But unless you have some evidence that your going to blow out
> > > > memory I'd just ignore the "wasted" bits. Especially if you can use
> > > > docValues,
> > > > that option holds much of the underlying data in MMapDirectory
> > > > that uses swappable OS memory
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Fri, Oct 16, 2015 at 1:53 AM, Robert Krüger 
> > > > wrote:
> > > > > Hi,
> > > > >
> > > > > I have a data model where I would store and index a lot of integer
> > > values
> > > > > with a very restricted range (e.g. 0-255), so theoretically the 32
> > bits
> > > > of
> > > > > Solr's integer fields are complete overkill. I want to be able to to
> > > > things
> > > > > like vector distance calculations on those fields. Should I worry
> > about
> > > > the
> > > > > "wasted" bits or will Solr compress/organize the index in a way that
> > > > > compensates for this if there are only 256 (or even fewer) distinct
> > > > values?
> > > > >
> > > > > Any recommendations on how my fields should be defined to make things
> > > > like
> > > > > numeric functions work as fast as technically possible?
> > > > >
> > > > > Thanks in advance,
> > > > >
> > > > > Robert
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Krüger
> > > Managing Partner
> > > Lesspain GmbH & Co. KG
> > >
> > > www.lesspain-software.com
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
> 
> 
> 
> -- 
> Robert Krüger
> Managing Partner
> Lesspain GmbH & 

RE: DIH Caching with Delta Import

2015-10-21 Thread Dyer, James
The DIH Cache feature does not work with delta import.  Actually, much of DIH 
does not work with delta import.  The workaround you describe is similar to the 
approach described here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which 
in my opinion is the best way to implement partial updates with DIH.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, October 20, 2015 8:02 PM
To: solr-user@lucene.apache.org
Subject: DIH Caching with Delta Import

It appears that DIH entity caching (e.g. SortedMapBackedCache) does not work
with deltas... is this simply a bug with the DIH cache support or somehow by
design?

Any ideas on a workaround for this? Ideally, I could just omit the
"cacheImpl" attribute but that leaves the query (using the default processor
in my case) without the appropriate where clause including the "cacheKey"
and "cacheLookup". Should SqlEntityProcessor be smart enough to ignore the
cache with deltas and simply append a where clause which includes the
"cacheKey" and "cacheLookup"? Or possibly just include a where clause which
includes ('${dih.request.command}' = 'full-import' or cacheKey =
cacheLookup)? I suppose those could be used to mitigate the issue but I was
hoping for possibly a better solution.

Any help would be greatly appreciated. Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-21 Thread Dhutia, Devansh
We are using aws, and standardized deployments using Chef. 

As Jeff points out below, Exhibitor is a good tool to deploy with Zookeeper. 
We’ve had very good luck with it. 



On 10/20/15, 7:59 PM, "Jeff Wartes"  wrote:

>
>If you’re using AWS, there’s this:
>https://github.com/LucidWorks/solr-scale-tk
>If you’re using chef, there’s this:
>https://github.com/vkhatri/chef-solrcloud
>
>(There are several other chef cookbooks for Solr out there, but this is
>the only one I’m aware of that supports Solr 5.3.)
>
>For ZK, I’m less familiar, but if you’re using chef there’s this:
>https://github.com/SimpleFinance/chef-zookeeper
>And this might be handy to know about too:
>https://github.com/Netflix/exhibitor/wiki
>
>
>On 10/20/15, 6:37 AM, "Davis, Daniel (NIH/NLM) [C]" 
>wrote:
>
>>Waste of money in my opinion.   I would point you towards other tools -
>>bash scripts and free configuration managers such as puppet, chef, salt,
>>or ansible.Depending on what development you are doing, you may want
>>a continuous integration environment.   For a small company starting out,
>>using a free CI, maybe SaaS, is a good choice.   A professional version
>>such as Bamboo, TeamCity, Jenkins are almost essential in a large
>>enterprise if you are doing diverse builds.
>>
>>When you create a VM, you can generally specify a script to run after the
>>VM is mostly created.   There is a protocol (PXE Boot) that enables this
>>- a PXE server listens and hears that a new server with such-and-such
>>Ethernet Address is starting.   The PXE server makes it boot like a
>>CD-ROM/DVD install, booting from installation media on the network and
>>installing.Once that install is down, a custom script may be invoked.
>>  This script is typically a bash script, because you may not be able to
>>count on too much else being installed.   However, python/perl are also
>>reasonable choices - just be careful that the modules/libraries you are
>>using for the script are present.The same PXE protocol is used in
>>large on-premises installations (vCenter) and in the cloud (AWS/Digital
>>Ocean).  We don't care about the PXE server - the point is that you can
>>generally run a bash script after your install.
>>
>>The bash script can bootstrap other services such as puppet, chef, or
>>salt, and/or setup keys so that push configuration management tools such
>>as ansible can reach the server.   The bash script may even be smart
>>enough to do all of the setup you need, depending on what other servers
>>you need to configure.   Smart bash scripts are good for a small company,
>>but for large setups, I'd use puppet, chef, salt, and/or ansible.
>>
>>What I tend to do is to deploy things in such a way that puppet (because
>>it is what we use here) can setup things so that a "solradm" account can
>>setup everything else, and solr and zookeeper are running as a "solrapp"
>>user using puppet.Then, my continuous integration server, which is
>>Atlassian Bamboo (you can also use tools such as Jenkins, TeamCity,
>>BuildBot), installs solr as "solradm" and sets it up to run as "solrapp".
>>
>>I am not a systems administrator, and I'm not really in "DevOps", my job
>>is to be above all of that and do "systems architecture" which I am lucky
>>still involves coding both in system administration and applications
>>development.   So, that's my 2 cents.
>>
>>Dan Davis, Systems/Applications Architect (Contractor),
>>Office of Computer and Communications Systems,
>>National Library of Medicine, NIH
>>
>>-Original Message-
>>From: Susheel Kumar [mailto:susheel2...@gmail.com]
>>Sent: Tuesday, October 20, 2015 9:19 AM
>>To: solr-user@lucene.apache.org
>>Subject: DevOps question : auto deployment/setup of Solr & Zookeeper on
>>medium-large clusters
>>
>>Hello,
>>
>>Resending to see opinion from Dev-Ops perspective on the tools for
>>installing/deployment of Solr & ZK on large no of machines and
>>maintaining them. I have heard Bladelogic or HP OO (commercial tools)
>>etc. being used.
>>Please share your experience or pros / cons of such tools.
>>
>>Thanks,
>>Susheel
>>
>>On Mon, Oct 19, 2015 at 3:32 PM, Susheel Kumar 
>>wrote:
>>
>>> Hi,
>>>
>>> I am trying to find the best practises for setting up Solr on new 20+
>>> machines  & ZK (5+) and repeating same on other environments.  What's
>>> the best way to download, extract, setup Solr & ZK in an automated way
>>> along with other dependencies like java etc.  Among shell scripts or
>>> puppet or docker or imaged vm's what is being used & suggested from
>>> Dev-Ops perspective.
>>>
>>> Thanks,
>>> Susheel
>>>
>


Re: Efficiency of integer storage/use

2015-10-21 Thread Robert Krüger
Thanks everyone, for your answers. I will probably make a simple parametric
test pumping a solr index full of those integers with very limited range
and then sorting by vector distances to see how the performance
characteristics are.

On Sun, Oct 18, 2015 at 9:08 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Robert,
> From what I know as inverted index as docvalues compress content much, even
> stored fields compressed too. So, I think you have much chance to
> experiment successfully. You might need tweak schema disabling storing
> unnecessary info in the index.
>
> On Sat, Oct 17, 2015 at 1:15 AM, Robert Krüger 
> wrote:
>
> > Thanks for the feedback.
> >
> > What I am trying to do is to "abuse" integers to store 8bit (or even
> lower)
> > values of metrics I use for content-based image/video search (such as
> > statistical values regarding color distribution) and then implement
> > similarity calculations based on formulas using vector distances. The
> Index
> > can become large (tens of millions of documents each with say 50-100
> > integers  describing the image metrics). I am looking at using a part of
> > those metrics for selecting a subset of images using range queries and
> then
> > more for sorting the result set by relevance.
> >
> > I was first looking at implementing those metrics as binary fields (see
> > other posting) and then use a custom function for the distance
> calculation
> > but so far I got the impression that way is not supported really well by
> > Solr. Base64-En/Decoding would kill performance and implementing a custom
> > field type with all that is probably required for that to work properly
> is
> > currently beyond my Solr knowledge. Besides, using built-in Solr features
> > makes it easier to finetune/experiment with different approaches,
> because I
> > can just play around with different queries and see what works best,
> > without each time adjusting a custom function.
> >
> > I hope that provides a better picture of what I am trying to achieve.
> >
> > Best,
> >
> > Robert
> >
> > On Fri, Oct 16, 2015 at 4:50 PM, Erick Erickson  >
> > wrote:
> >
> > > Under the covers, Lucene stores ints in a packed format, so I'd just
> > count
> > > on that for a first pass.
> > >
> > > What is "a lot of integer values"? Hundreds of millions? Billions?
> > > Trillions?
> > >
> > > Unless you give us some indication of scale, it's hard to say anything
> > > helpful. But unless you have some evidence that your going to blow out
> > > memory I'd just ignore the "wasted" bits. Especially if you can use
> > > docValues,
> > > that option holds much of the underlying data in MMapDirectory
> > > that uses swappable OS memory
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Oct 16, 2015 at 1:53 AM, Robert Krüger 
> > > wrote:
> > > > Hi,
> > > >
> > > > I have a data model where I would store and index a lot of integer
> > values
> > > > with a very restricted range (e.g. 0-255), so theoretically the 32
> bits
> > > of
> > > > Solr's integer fields are complete overkill. I want to be able to to
> > > things
> > > > like vector distance calculations on those fields. Should I worry
> about
> > > the
> > > > "wasted" bits or will Solr compress/organize the index in a way that
> > > > compensates for this if there are only 256 (or even fewer) distinct
> > > values?
> > > >
> > > > Any recommendations on how my fields should be defined to make things
> > > like
> > > > numeric functions work as fast as technically possible?
> > > >
> > > > Thanks in advance,
> > > >
> > > > Robert
> > >
> >
> >
> >
> > --
> > Robert Krüger
> > Managing Partner
> > Lesspain GmbH & Co. KG
> >
> > www.lesspain-software.com
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Robert Krüger
Managing Partner
Lesspain GmbH & Co. KG

www.lesspain-software.com


Re: result grouping on all documents

2015-10-21 Thread Emir Arnautovic

Hi Christian,
It seems to me that you can use range faceting to get counts.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 20.10.2015 17:05, Christian Reuschling wrote:

Hi,

we try to get the number of documents for given time slots in the index 
efficiently.


For this, we query the solr index like this:

http://localhost:8014/solr/myCore/query?q=*:*=1=id=true=modified:[201103010%20TO%20201302010]=modified:[201303010%20TO%20201502010]=1=false

for now, the modified field is a number field with trie index (tlong in 
schema.xml).

We have about 30M documents in the index.

This query works fine, but if the number of group queries gets higher (e.g. 
200), the response time
gets terribly slow.
As we need only the number of documents per group and never the score, or some 
other data of the
documents, we are wondering if there is a faster method to get this information.


Thanks

Christian



How to get the join data by multiple cores?

2015-10-21 Thread Shuhei Suzuki
hello,
What can I do to throw a query such as the following in Solr?

 SELECT
  child. *, parent. *
 FROM child
 JOIN parent
 WHERE child.parent_id = parent.id AND parent.tag = 'hoge'`

child and parent is not that parent is more than in a many-to-one
relationship.
I try this but can not.

 /select/?q={!join from=parent_id to=id fromIndex=parent}id:1+tag:hoge





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-the-join-data-by-multiple-cores-tp4235799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index Multiple entity in one collection core

2015-10-21 Thread anurupborah2001
HI,
I am having difficulty in indexing multiple entity in one
collection..When i try to index only the entity defined at last gets
index..Please help to assist as I am getting hard time to solve it.
The below are the config :
--
data-config.xml
--




  

























schema.xml
--
  


singlekey
   















Please kindly help me in this...I am not able to index the table 1, instead
the table 2 and table 3 (which are 1 to many relationship tables) are
getting indexed but table1 not getting indexed..

Thanks for help in advance
Regards
Anurup






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Multiple-entity-in-one-collection-core-tp4235810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get the join data by multiple cores?

2015-10-21 Thread cai xingliang
{!join fromIndex=parent from=id to=parent_id}tag:hoge

That should work.
On Oct 22, 2015 12:35 PM, "Shuhei Suzuki"  wrote:

> hello,
> What can I do to throw a query such as the following in Solr?
>
>  SELECT
>   child. *, parent. *
>  FROM child
>  JOIN parent
>  WHERE child.parent_id = parent.id AND parent.tag = 'hoge'`
>
> child and parent is not that parent is more than in a many-to-one
> relationship.
> I try this but can not.
>
>  /select/?q={!join from=parent_id to=id fromIndex=parent}id:1+tag:hoge
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-get-the-join-data-by-multiple-cores-tp4235799.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Wildcard "?" ?

2015-10-21 Thread Bruno Mannina

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

 SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


   Field: title

Field-Type:
   org.apache.solr.schema.TextField
PI Gap:
   100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


 *

   org.apache.solr.analysis.TokenizerChain

 *

   org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Walter Underwood
Does the collection reload do a rolling reload of each node or does it do them 
all at once? We were planning on using the core reload on each system, one at a 
time. That would make sure the collection stays available.

I read the documentation, it didn’t say anything about that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 21, 2015, at 8:36 AM, Erick Erickson  wrote:
> 
> Please be very careful using the core admin UI for anything related to
> SolrCloud. In fact, I try to avoid using it at all.
> 
> The reason is that it is very low-level, and it is very easy to use it
> incorrectly. For instance, reloading a core in a multi-replica setup
> (doesnt matter whether it's several shards or just a single shard with
> multiple replicas) will reload _only_ that core, leaving the other
> replicas in your collection with the old configs.
> 
> Always use the collections API if at all possible, see:
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> 
> Best,
> Erick
> 
> On Wed, Oct 21, 2015 at 1:02 AM, Hangu Choi  wrote:
>> Mikhail,
>> I solved the problem, I putfile to wrong path. /synonyms.txt  should be
>> /configs/gettingstarted/synonyms.txt .
>> 
>> 
>> Regards,
>> Hangu
>> 
>> On Wed, Oct 21, 2015 at 4:17 PM, Hangu Choi  wrote:
>> 
>>> Mikhail,
>>> 
>>> I didn't understatnd that's what I need to do. thank you.
>>> 
>>> but at the first moment, I am not doing well..
>>> I am testing to change configuration in solrcloud, through this command
>>> 
>>> ./zkcli.sh -zkhost localhost:9983 -cmd putfile /synonyms.txt
>>> /usr/local/solr-5.3.1-test/server/scripts/cloud-scripts/synonyms.txt
>>> and no error message was occured.
>>> 
>>> and then I reloaded solr at localhost:8983 coreAdmin.
>>> then I checked synonyms.txt file at localhost:8983/solr/#/~cloud?view=tree
>>> but nothing happend. what's wrong?
>>> 
>>> 
>>> 
>>> 
>>> Regards,
>>> Hangu
>>> 
>>> On Tue, Oct 20, 2015 at 9:18 PM, Mikhail Khludnev <
>>> mkhlud...@griddynamics.com> wrote:
>>> 
 did you try something like
 $> zkcli.sh -zkhost localhost:2181 -cmd putfile /solr.xml
 /path/to/solr.xml
 ?
 
 On Mon, Oct 19, 2015 at 11:15 PM, hangu choi  wrote:
 
> Hi,
> 
> I am trying to start SolrCloud with embedded ZooKeeper.
> 
> I know how to config solrconfig.xml and schema.xml, and other things for
> data import handler.
> but when I trying to config it with solrCloud, I don't know where to
 start.
> 
> I know there is no conf directory in SolrCloud because conf directory
 are
> stored in ZooKeeper.
> Then, how can I config that? I read this (
> 
> 
 https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> )
> but I failed to understand.
> 
> I need to config solrconfig.xml and schema.xml for my custom schema.
> 
> 
> Regards,
> Hangu
> 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 
 
 
>>> 
>>> 



Re: Wildcard "?" ?

2015-10-21 Thread Upayavira
No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:
> Dear Solr-user,
> 
> I'm surprise to see in my SOLR 5.0 that the wildward ? replace 
> inevitably 1 character.
> 
> my request is:
> 
> title:magnet? AND tire?
> 
>   SOLR found only title with a character after magnet and tire but don't 
> found
> title with only magnet AND tire
> 
> 
> Do you know where can I tell to solr that ? wildcard means [0, 1] 
> character and not [1] character ?
> Is it possible ?
> 
> 
> Thanks a lot !
> 
> my field in my schema is defined like that:
> 
> 
> Field: title
> 
> Field-Type:
> org.apache.solr.schema.TextField
> PI Gap:
> 100
> 
> Flags:  Indexed Tokenized   Stored  Multivalued
> Properties  y
>   y
>   y
>   y
> Schema  y
>   y
>   y
>   y
> Index   y
>   y
>   y
>   
> 
>   *
> 
> org.apache.solr.analysis.TokenizerChain
> 
>   *
> 
> org.apache.solr.analysis.TokenizerChain
> 
> 
> 
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> http://www.avast.com


Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Erick Erickson
Please be very careful using the core admin UI for anything related to
SolrCloud. In fact, I try to avoid using it at all.

The reason is that it is very low-level, and it is very easy to use it
incorrectly. For instance, reloading a core in a multi-replica setup
(doesnt matter whether it's several shards or just a single shard with
multiple replicas) will reload _only_ that core, leaving the other
replicas in your collection with the old configs.

Always use the collections API if at all possible, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API

Best,
Erick

On Wed, Oct 21, 2015 at 1:02 AM, Hangu Choi  wrote:
> Mikhail,
> I solved the problem, I putfile to wrong path. /synonyms.txt  should be
> /configs/gettingstarted/synonyms.txt .
>
>
> Regards,
> Hangu
>
> On Wed, Oct 21, 2015 at 4:17 PM, Hangu Choi  wrote:
>
>> Mikhail,
>>
>> I didn't understatnd that's what I need to do. thank you.
>>
>> but at the first moment, I am not doing well..
>> I am testing to change configuration in solrcloud, through this command
>>
>> ./zkcli.sh -zkhost localhost:9983 -cmd putfile /synonyms.txt
>> /usr/local/solr-5.3.1-test/server/scripts/cloud-scripts/synonyms.txt
>> and no error message was occured.
>>
>> and then I reloaded solr at localhost:8983 coreAdmin.
>> then I checked synonyms.txt file at localhost:8983/solr/#/~cloud?view=tree
>> but nothing happend. what's wrong?
>>
>>
>>
>>
>> Regards,
>> Hangu
>>
>> On Tue, Oct 20, 2015 at 9:18 PM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com> wrote:
>>
>>> did you try something like
>>> $> zkcli.sh -zkhost localhost:2181 -cmd putfile /solr.xml
>>> /path/to/solr.xml
>>> ?
>>>
>>> On Mon, Oct 19, 2015 at 11:15 PM, hangu choi  wrote:
>>>
>>> > Hi,
>>> >
>>> > I am trying to start SolrCloud with embedded ZooKeeper.
>>> >
>>> > I know how to config solrconfig.xml and schema.xml, and other things for
>>> > data import handler.
>>> > but when I trying to config it with solrCloud, I don't know where to
>>> start.
>>> >
>>> > I know there is no conf directory in SolrCloud because conf directory
>>> are
>>> > stored in ZooKeeper.
>>> > Then, how can I config that? I read this (
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
>>> > )
>>> > but I failed to understand.
>>> >
>>> > I need to config solrconfig.xml and schema.xml for my custom schema.
>>> >
>>> >
>>> > Regards,
>>> > Hangu
>>> >
>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> 
>>> 
>>>
>>
>>


Re: Wildcard "?" ?

2015-10-21 Thread Bruno Mannina

title:/magnet.?/ doesn't work for me because solr answers:

|title = "Magnetic folding system"|

but thanks to give me the idea to use regexp !!!

Le 21/10/2015 18:46, Upayavira a écrit :

No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

   SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


 Field: title

Field-Type:
 org.apache.solr.schema.TextField
PI Gap:
 100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


   *

 org.apache.solr.analysis.TokenizerChain

   *

 org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


Re: LIX readability index calculation by solr

2015-10-21 Thread Roland Szűcs
Hi Wunder, 

Yes I can reload the documents it takes max 2-3 hours. I have never used the 
update request proccessor but I will check it on the Solr Wiki. Thanks your help

Cheers,
Roland



2015. okt. 21. dátummal, 17:25 időpontban Walter Underwood 
 írta:

> Can you reload all the content?
> 
> If so, I would calculate this in an update request processor and put the 
> result in its own field.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Oct 21, 2015, at 2:53 AM, Roland Szűcs  wrote:
>> 
>> Thank Toke your quick response. All your suggestions seem to be very good 
>> idea. I found the capital letters also strange because of the names, places 
>> so I will skip this part as I do not need an absolute measure just a ranked 
>> order among my documents,
>> 
>> cheers,
>> Roland
>> 
>> 
>> 
>> 2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen 
>>  írta:
>> 
>>> Roland Szűcs  wrote:
 My use case is that I have to calculate the LIX readability index for my
 documents.
>>> [...]
 *B* = Number of periods (defined by period, colon or capital first letter)
>>> [...]
 Does anybody have idea how to get the number of "periods"?
>>> 
>>> As the positions does not matter, you could make a copyField containing 
>>> only punctuation. And maybe extended with a replace filter so that you have 
>>> dot, comma, color, bang, question ect. instead of .,:!?
>>> 
>>> The capital first letter seems a bit strange to me - what about names? But 
>>> anyway, you could do it with a PatternReplaceCharFilter, matching on 
>>> something like 
>>> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
>>> and replacing with 'capital' (the regexp above probably fails - it was just 
>>> from memory).
>>> 
>>> - Toke Eskildsen
> 


[newbie] questions about 3.6.0 and 4.x or 5.x ?

2015-10-21 Thread Robert Hume
Hello, I'm hoping to get some quick advice from the Solr gurus out there ...



I’ve inherited a project that uses a Solr 3.6.0 deployment.   (Several
masters and several slaves – I think there are 6 Solr instances in total.)



I’ve been tasked with investigating if upgrading our 3.6.0 deployment will
improve performance – there’s a lot of data and things are getting slow,
apparently.



I’ve read Apache docs that from 3.6.x to 4.x there were improvements in
scalability and performance.



I see that from 4.x to 5.x that Solr is now a standable server and no
longer just a WAR running on Tomcat.





QUESTIONS:


A. Is it worth upgrading to 4.x or 5.x?  Will I see a big improvement in
performance?



B. Should I got to 4.x or 5.x?  Will 4.x be an easier upgrade path since
it's just a new WAR file?



C. In a nutshell ... what will the upgrade path look like, what kind of
steps am I in for, and how can I avoid trouble?




Any help is GREATLY appreciated!!


Rob


Re: [newbie] questions about 3.6.0 and 4.x or 5.x ?

2015-10-21 Thread Shawn Heisey
On 10/21/2015 12:41 PM, Robert Hume wrote:
> I've inherited a project that uses a Solr 3.6.0 deployment.   (Several
> masters and several slaves – I think there are 6 Solr instances in total.)
>
> I've been tasked with investigating if upgrading our 3.6.0 deployment will
> improve performance – there’s a lot of data and things are getting slow,
> apparently.
>
> I've read Apache docs that from 3.6.x to 4.x there were improvements in
> scalability and performance.

Performance does get better in newer versions, but for most use cases,
there is NOT a night/day difference, just a minor speedup.  Upgrading
*might* help, but even if it does, chances are that it will not
completely solve the problem.

The most common reason for Solr performance problems is that there is
not enough memory.  That might mean the java heap is a little too small,
but more frequently, it means that there's not enough memory in the
server to cache the index contents effectively.

General information:

https://wiki.apache.org/solr/SolrPerformanceProblems

Solr 3.6.x is very solid software, despite its age.  The newest version
is (IMHO) better, but if 3.x (3.6.2 in particular) meets your needs, you
can keep using it.  Solr 3.x can run with a very ancient version of Java
-- version 5!  I believe that it still works even in Java 8.

> I see that from 4.x to 5.x that Solr is now a standable server and no
> longer just a WAR running on Tomcat.

Yes.  There's a lot that could be said about that topic.  The highlights
are here:

https://wiki.apache.org/solr/WhyNoWar

Thanks,
Shawn



Re: [newbie] questions about 3.6.0 and 4.x or 5.x ?

2015-10-21 Thread Erick Erickson
To chime in, in certain cases the memory requirements for 4x (and 5x) are _much_
improved, see: 
https://lucidworks.com/blog/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/

But as Shawn says, it's not a magic bullet.

Solr 5 requires Java 7, so that's one thing to be aware of. Plus, you
either have to upgrade
your indexes to 4.x, then install/upgrade to 5x or if you want to jump
straight from
3x to 5x, you need to re-index from scratch; Solr 5x will not read an
index created with
Solr 3x.

And rather than have a master/slave setup you'll probably want to
migrate to SolrCloud
as well, it's much easier to create/manage a cluster with shards with SolrCloud.

Best,
Erick



On Wed, Oct 21, 2015 at 12:28 PM, Shawn Heisey  wrote:
> On 10/21/2015 12:41 PM, Robert Hume wrote:
>> I've inherited a project that uses a Solr 3.6.0 deployment.   (Several
>> masters and several slaves – I think there are 6 Solr instances in total.)
>>
>> I've been tasked with investigating if upgrading our 3.6.0 deployment will
>> improve performance – there’s a lot of data and things are getting slow,
>> apparently.
>>
>> I've read Apache docs that from 3.6.x to 4.x there were improvements in
>> scalability and performance.
>
> Performance does get better in newer versions, but for most use cases,
> there is NOT a night/day difference, just a minor speedup.  Upgrading
> *might* help, but even if it does, chances are that it will not
> completely solve the problem.
>
> The most common reason for Solr performance problems is that there is
> not enough memory.  That might mean the java heap is a little too small,
> but more frequently, it means that there's not enough memory in the
> server to cache the index contents effectively.
>
> General information:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Solr 3.6.x is very solid software, despite its age.  The newest version
> is (IMHO) better, but if 3.x (3.6.2 in particular) meets your needs, you
> can keep using it.  Solr 3.x can run with a very ancient version of Java
> -- version 5!  I believe that it still works even in Java 8.
>
>> I see that from 4.x to 5.x that Solr is now a standable server and no
>> longer just a WAR running on Tomcat.
>
> Yes.  There's a lot that could be said about that topic.  The highlights
> are here:
>
> https://wiki.apache.org/solr/WhyNoWar
>
> Thanks,
> Shawn
>


Re: `cat /dev/null > solr-8983-console.log` frees host's memory

2015-10-21 Thread Rajani Maski
This details in this link[1] might be of help.

[1]https://support.lucidworks.com/hc/en-us/articles/207072137

On Wed, Oct 21, 2015 at 7:42 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Eric,
> As Shawn explained, memory is freed because it was used to cache portion
> of log file.
>
> Since you are already with Sematext, I guess you are aware, but doesn't
> hurt to remind you that we also have Logsene that you can use to manage
> your logs: http://sematext.com/logsene/index.html
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On 20.10.2015 17:42, Shawn Heisey wrote:
>
>> On 10/20/2015 9:19 AM, Eric Torti wrote:
>>
>>> I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux
>>> 64-bit box and decided to `cat /dev/null > solr-8983-console.log` to
>>> free space.
>>>
>>> The weird thing is that when I checked Sematext I noticed the OS had
>>> freed a lot of memory at the same exact instant I did that.
>>>
>> On that memory graph, the legend doesn't indicate which of the graph
>> colors represent each of the four usage types at the top -- they all
>> have blue checkboxes, so I can't tell for sure what changed.
>>
>> If the number that dropped is "cached" (which I think is likely) then
>> everything is working exactly as it should.  The OS had simply cached a
>> large chunk of the logfile, exactly as it is designed to do, and once
>> the file was deleted, it stopped reserving that memory and made it
>> available.
>>
>> https://en.wikipedia.org/wiki/Page_cache
>>
>> Thanks,
>> Shawn
>>
>>


Re: Wildcard "?" ?

2015-10-21 Thread Upayavira
regexp will match the whole term. So, if you have stemming on, magnetic
may well stem to magnet, and that is the term against which the regexp
is executed.

If you want to do the regexp against the whole field, then you need to
do it against a string version of that field.

The process of using a regexp (and a wildcard for that matter) is:
 * search through the list of terms in your field for terms that match
 your regexp (uses an FST for speed)
 * search for documents that contain those resulting terms

Upayavira

On Wed, Oct 21, 2015, at 12:08 PM, Bruno Mannina wrote:
> title:/magnet.?/ doesn't work for me because solr answers:
> 
> |title = "Magnetic folding system"|
> 
> but thanks to give me the idea to use regexp !!!
> 
> Le 21/10/2015 18:46, Upayavira a écrit :
> > No, you cannot tell Solr to handle wildcards differently. However, you
> > can use regular expressions for searching:
> >
> > title:/magnet.?/ should do it.
> >
> > Upayavira
> >
> > On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:
> >> Dear Solr-user,
> >>
> >> I'm surprise to see in my SOLR 5.0 that the wildward ? replace
> >> inevitably 1 character.
> >>
> >> my request is:
> >>
> >> title:magnet? AND tire?
> >>
> >>SOLR found only title with a character after magnet and tire but don't
> >> found
> >> title with only magnet AND tire
> >>
> >>
> >> Do you know where can I tell to solr that ? wildcard means [0, 1]
> >> character and not [1] character ?
> >> Is it possible ?
> >>
> >>
> >> Thanks a lot !
> >>
> >> my field in my schema is defined like that:
> >>
> >>
> >>  Field: title
> >>
> >> Field-Type:
> >>  org.apache.solr.schema.TextField
> >> PI Gap:
> >>  100
> >>
> >> Flags:  Indexed Tokenized   Stored  Multivalued
> >> Properties  y
> >>y
> >>y
> >>y
> >> Schema  y
> >>y
> >>y
> >>y
> >> Index   y
> >>y
> >>y
> >>
> >>
> >>*
> >>
> >>  org.apache.solr.analysis.TokenizerChain
> >>
> >>*
> >>
> >>  org.apache.solr.analysis.TokenizerChain
> >>
> >>
> >>
> >>
> >> ---
> >> L'absence de virus dans ce courrier électronique a été vérifiée par le
> >> logiciel antivirus Avast.
> >> http://www.avast.com
> >
> 
> 
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> http://www.avast.com


Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Erick Erickson
Hmmm, not entirely sure. It's perfectly reasonable to use the core
admin API, just
be careful with it especially for things like reload, it's pretty easy
to have your cluster
in an inconsistent state.

Looks like the collections RELOAD command sends requests out all
replicas at once.

Under the covers, though, it just calls the core admin API (with the
correct parameters)
to carry out whatever is required. There's no intrinsic reason you
can't do this yourself
for each and every replica, you just have to insure that you get them _all_.

FWIW

On Wed, Oct 21, 2015 at 9:00 AM, Walter Underwood  wrote:
> Does the collection reload do a rolling reload of each node or does it do 
> them all at once? We were planning on using the core reload on each system, 
> one at a time. That would make sure the collection stays available.
>
> I read the documentation, it didn’t say anything about that.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Oct 21, 2015, at 8:36 AM, Erick Erickson  wrote:
>>
>> Please be very careful using the core admin UI for anything related to
>> SolrCloud. In fact, I try to avoid using it at all.
>>
>> The reason is that it is very low-level, and it is very easy to use it
>> incorrectly. For instance, reloading a core in a multi-replica setup
>> (doesnt matter whether it's several shards or just a single shard with
>> multiple replicas) will reload _only_ that core, leaving the other
>> replicas in your collection with the old configs.
>>
>> Always use the collections API if at all possible, see:
>> https://cwiki.apache.org/confluence/display/solr/Collections+API
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 21, 2015 at 1:02 AM, Hangu Choi  wrote:
>>> Mikhail,
>>> I solved the problem, I putfile to wrong path. /synonyms.txt  should be
>>> /configs/gettingstarted/synonyms.txt .
>>>
>>>
>>> Regards,
>>> Hangu
>>>
>>> On Wed, Oct 21, 2015 at 4:17 PM, Hangu Choi  wrote:
>>>
 Mikhail,

 I didn't understatnd that's what I need to do. thank you.

 but at the first moment, I am not doing well..
 I am testing to change configuration in solrcloud, through this command

 ./zkcli.sh -zkhost localhost:9983 -cmd putfile /synonyms.txt
 /usr/local/solr-5.3.1-test/server/scripts/cloud-scripts/synonyms.txt
 and no error message was occured.

 and then I reloaded solr at localhost:8983 coreAdmin.
 then I checked synonyms.txt file at localhost:8983/solr/#/~cloud?view=tree
 but nothing happend. what's wrong?




 Regards,
 Hangu

 On Tue, Oct 20, 2015 at 9:18 PM, Mikhail Khludnev <
 mkhlud...@griddynamics.com> wrote:

> did you try something like
> $> zkcli.sh -zkhost localhost:2181 -cmd putfile /solr.xml
> /path/to/solr.xml
> ?
>
> On Mon, Oct 19, 2015 at 11:15 PM, hangu choi  wrote:
>
>> Hi,
>>
>> I am trying to start SolrCloud with embedded ZooKeeper.
>>
>> I know how to config solrconfig.xml and schema.xml, and other things for
>> data import handler.
>> but when I trying to config it with solrCloud, I don't know where to
> start.
>>
>> I know there is no conf directory in SolrCloud because conf directory
> are
>> stored in ZooKeeper.
>> Then, how can I config that? I read this (
>>
>>
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
>> )
>> but I failed to understand.
>>
>> I need to config solrconfig.xml and schema.xml for my custom schema.
>>
>>
>> Regards,
>> Hangu
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


>


Re: `cat /dev/null > solr-8983-console.log` frees host's memory

2015-10-21 Thread Eric Torti
Thank you Shawn, Timothy, Emir and Rajani.

Sorry, Shawn, I ended up cropping out the legend but you were right on
your guess.

Indeed, Timothy, this log is completely redundant. Will get rid of it soon.

I'll look into the resources you all pointed out. Thanks!

Best,

Eric Torti

On Wed, Oct 21, 2015 at 8:21 PM, Rajani Maski
 wrote:
> This details in this link[1] might be of help.
>
> [1]https://support.lucidworks.com/hc/en-us/articles/207072137
>
> On Wed, Oct 21, 2015 at 7:42 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Eric,
>> As Shawn explained, memory is freed because it was used to cache portion
>> of log file.
>>
>> Since you are already with Sematext, I guess you are aware, but doesn't
>> hurt to remind you that we also have Logsene that you can use to manage
>> your logs: http://sematext.com/logsene/index.html
>>
>> Thanks,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>>
>> On 20.10.2015 17:42, Shawn Heisey wrote:
>>
>>> On 10/20/2015 9:19 AM, Eric Torti wrote:
>>>
 I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux
 64-bit box and decided to `cat /dev/null > solr-8983-console.log` to
 free space.

 The weird thing is that when I checked Sematext I noticed the OS had
 freed a lot of memory at the same exact instant I did that.

>>> On that memory graph, the legend doesn't indicate which of the graph
>>> colors represent each of the four usage types at the top -- they all
>>> have blue checkboxes, so I can't tell for sure what changed.
>>>
>>> If the number that dropped is "cached" (which I think is likely) then
>>> everything is working exactly as it should.  The OS had simply cached a
>>> large chunk of the logfile, exactly as it is designed to do, and once
>>> the file was deleted, it stopped reserving that memory and made it
>>> available.
>>>
>>> https://en.wikipedia.org/wiki/Page_cache
>>>
>>> Thanks,
>>> Shawn
>>>
>>>


Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Hangu Choi
Mikhail,

I didn't understatnd that's what I need to do. thank you.

but at the first moment, I am not doing well..
I am testing to change configuration in solrcloud, through this command

./zkcli.sh -zkhost localhost:9983 -cmd putfile /synonyms.txt
/usr/local/solr-5.3.1-test/server/scripts/cloud-scripts/synonyms.txt
and no error message was occured.

and then I reloaded solr at localhost:8983 coreAdmin.
then I checked synonyms.txt file at localhost:8983/solr/#/~cloud?view=tree
but nothing happend. what's wrong?




Regards,
Hangu

On Tue, Oct 20, 2015 at 9:18 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> did you try something like
> $> zkcli.sh -zkhost localhost:2181 -cmd putfile /solr.xml /path/to/solr.xml
> ?
>
> On Mon, Oct 19, 2015 at 11:15 PM, hangu choi  wrote:
>
> > Hi,
> >
> > I am trying to start SolrCloud with embedded ZooKeeper.
> >
> > I know how to config solrconfig.xml and schema.xml, and other things for
> > data import handler.
> > but when I trying to config it with solrCloud, I don't know where to
> start.
> >
> > I know there is no conf directory in SolrCloud because conf directory are
> > stored in ZooKeeper.
> > Then, how can I config that? I read this (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> > )
> > but I failed to understand.
> >
> > I need to config solrconfig.xml and schema.xml for my custom schema.
> >
> >
> > Regards,
> > Hangu
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: LIX readability index calculation by solr

2015-10-21 Thread Toke Eskildsen
Roland Szűcs  wrote:
> My use case is that I have to calculate the LIX readability index for my
> documents.
[...]
> *B* = Number of periods (defined by period, colon or capital first letter)
[...]
> Does anybody have idea how to get the number of "periods"?

As the positions does not matter, you could make a copyField containing only 
punctuation. And maybe extended with a replace filter so that you have dot, 
comma, color, bang, question ect. instead of .,:!?

The capital first letter seems a bit strange to me - what about names? But 
anyway, you could do it with a PatternReplaceCharFilter, matching on something 
like 
([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
and replacing with 'capital' (the regexp above probably fails - it was just 
from memory).

- Toke Eskildsen


Re: LIX readability index calculation by solr

2015-10-21 Thread Roland Szűcs
Thank Toke your quick response. All your suggestions seem to be very good idea. 
I found the capital letters also strange because of the names, places so I will 
skip this part as I do not need an absolute measure just a ranked order among 
my documents,

cheers,
Roland



2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen 
 írta:

> Roland Szűcs  wrote:
>> My use case is that I have to calculate the LIX readability index for my
>> documents.
> [...]
>> *B* = Number of periods (defined by period, colon or capital first letter)
> [...]
>> Does anybody have idea how to get the number of "periods"?
> 
> As the positions does not matter, you could make a copyField containing only 
> punctuation. And maybe extended with a replace filter so that you have dot, 
> comma, color, bang, question ect. instead of .,:!?
> 
> The capital first letter seems a bit strange to me - what about names? But 
> anyway, you could do it with a PatternReplaceCharFilter, matching on 
> something like 
> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
> and replacing with 'capital' (the regexp above probably fails - it was just 
> from memory).
> 
> - Toke Eskildsen


Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Hangu Choi
Mikhail,
I solved the problem, I putfile to wrong path. /synonyms.txt  should be
/configs/gettingstarted/synonyms.txt .


Regards,
Hangu

On Wed, Oct 21, 2015 at 4:17 PM, Hangu Choi  wrote:

> Mikhail,
>
> I didn't understatnd that's what I need to do. thank you.
>
> but at the first moment, I am not doing well..
> I am testing to change configuration in solrcloud, through this command
>
> ./zkcli.sh -zkhost localhost:9983 -cmd putfile /synonyms.txt
> /usr/local/solr-5.3.1-test/server/scripts/cloud-scripts/synonyms.txt
> and no error message was occured.
>
> and then I reloaded solr at localhost:8983 coreAdmin.
> then I checked synonyms.txt file at localhost:8983/solr/#/~cloud?view=tree
> but nothing happend. what's wrong?
>
>
>
>
> Regards,
> Hangu
>
> On Tue, Oct 20, 2015 at 9:18 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> did you try something like
>> $> zkcli.sh -zkhost localhost:2181 -cmd putfile /solr.xml
>> /path/to/solr.xml
>> ?
>>
>> On Mon, Oct 19, 2015 at 11:15 PM, hangu choi  wrote:
>>
>> > Hi,
>> >
>> > I am trying to start SolrCloud with embedded ZooKeeper.
>> >
>> > I know how to config solrconfig.xml and schema.xml, and other things for
>> > data import handler.
>> > but when I trying to config it with solrCloud, I don't know where to
>> start.
>> >
>> > I know there is no conf directory in SolrCloud because conf directory
>> are
>> > stored in ZooKeeper.
>> > Then, how can I config that? I read this (
>> >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
>> > )
>> > but I failed to understand.
>> >
>> > I need to config solrconfig.xml and schema.xml for my custom schema.
>> >
>> >
>> > Regards,
>> > Hangu
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>> 
>>
>
>


LIX readability index calculation by solr

2015-10-21 Thread Roland Szűcs
Hi all,

My use case is that I have to calculate the LIX readability index for my
documents.

*LIX = A/B + (C x 100)/A*, where

*A* = Number of words
*B* = Number of periods (defined by period, colon or capital first letter)
*C* = Number of long words (More than 6 letters)

A can easily be done if the index size does not matter as I define a filed
in the schema without stemming and stop word elimination and use the term
vector compnent. I can calculate all the words, I can calculate easily the
number of long words also.
The only missin component is B.

Does anybody have idea how to get the number of "periods"?

Cheers


-- 
Roland Szűcs
Connect with
me on Linkedin 
CEOPhone: +36 1 210 81 13Bookandwalk.hu