CDCR with SSL enabled

2017-05-01 Thread Xie, Sean
Does CDCR support SSL encrypted SolrCloud?

I have two clusters started with SSL, and CDCR setup instruction is followed on 
source and target. However, from the solr.log, I’m not able to see CDCR is 
occurring. Not sure what has been setup incorrectly.

From the solr.log, I can’t find useful info related CDCR during the indexing 
time. Any help on how to probe the issue is appreciated.

The Target config:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  
  ${solr.ulog.dir:}
  

  

The Source config:
  

  zk_ip:2181
  SourceCollection 
  TargetCollection 


  8
  1000
  128


  1000

  

  

  
  ${solr.ulog.dir:}
  
  


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Suggester uses lots of 'Page cache' memory

2017-05-01 Thread Damien Kamerman
Hi all,

I have a Solr v6.4.2 collection with 12 shards and 2 replicas. Each replica
uses about 14GB disk usage. I'm using Solaris 11 and I see the 'Page cache'
grow by about 7GB for each suggester replica I build. The suggester index
itself is very small. The 'Page cache' memory is freed when the node is
stopped.

I guess the Suggester component is mmap'ing the entire Lucene index into
memory and holding it? Is this expected behavior? Is there a workaround?

I use this command to build the suggester for just the replica
'target1_shard1_replica1':
curl "
http://localhost:8983/solr/collection1/suggest?suggest.dictionary=mySuggester=true=localhost:8983/solr/target1_shard1_replica1
"

BTW: Without the 'shards' param the distributed request will randomly hit
half the replicas.

>From my solrconfig.xml:


mySuggester
AnalyzingInfixLookupFactory
mySuggester
DocumentDictionaryFactory
mySuggest
x
suggestTypeLc
false



Cheers,
Damien.


IndexFormatTooNewException - MapReduceIndexerTool for PDF files

2017-05-01 Thread ecos
Hi I'm getting the following error when trying to index PDF documents using
the MapReduceIndexerTool in Cloudera:


 

The cause of the error is:
org.apache.lucene.index.IndexFormatTooNewException: Format version is not
supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
be between 0 and 3).

Reading out there I found the exception is thrown when Lucene detects an
index that is newer that the Lucene version.

My configuration is:
SOLR: 4.10.3
Cloudera: 5.8.0
Hadoop: 2.6.0



In order to index I´m following the tutorial:  

 

Using the following hadoop command:
hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-D dfs.replication=1 \
--morphline-file /root/$COLLECTION/conf/pdf_morphlines.conf \
--output-dir hdfs://localhost:8020/user/$USER/outdir --verbose \
--solr-home-dir $HOME/$COLLECTION --shards 1 \
hdfs://localhost:8020/user/$USER/indir

The morphlines file:
pdf_morphlines.conf
  

And the schema file:
schema.xml   

Thank you.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS.

On 5/1/17, 7:22 PM, "Will Martin"  wrote:

Ubuntu 16.04 LTS - Xenial (HVM)

Is this your Xenial version?




On 5/1/2017 6:37 PM, Jeff Wartes wrote:
> I tried a few variations of various things before we found and tried that 
linux/EC2 tuning page, including:
>- EC2 instance type: r4, c4, and i3
>- Ubuntu version: Xenial and Trusty
>- EBS vs local storage
>- Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m 
aware of the issues with early java8 versions and I’m not using G1)
>
> Most of those attempts were to help reduce differences between the data 
center and the EC2 cluster. In all cases I re-indexed from scratch. I got the 
same very high system-time symptom in all cases. With the linux changes in 
place, we settled on r4/Xenial/EBS/Stock.
>
> Again, this was a slightly modified Solr 5.4, (I added backup requests, 
and two memory allocation rate tweaks that have long since been merged into 
mainline - released in 6.2 I think. I can dig up the jira numbers if anyone’s 
interested) I’ve never used Solr 6.x in production though.
> The only reason I mentioned 6.x at all is because I’m aware that ES 5.x 
is based on Lucene 6.2. I don’t believe my coworker spent any time on tuning 
his ES setup, although I think he did try G1.
>
> I definitely do want to binary-search those settings until I understand 
better what exactly did the trick.
> It’s a long cycle time per test is the problem, but hopefully in the next 
couple of weeks.
>
>
>
> On 5/1/17, 7:26 AM, "John Bickerstaff"  wrote:
>
>  It's also very important to consider the type of EC2 instance you are
>  using...
>  
>  We settled on the R4.2XL...  The R series is labeled "High-Memory"
>  
>  Which instance type did you end up using?
>  
>  On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey  
wrote:
>  
>  > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
>  > > tldr: Recently, I tried moving an existing solrcloud 
configuration from
>  > a local datacenter to EC2. Performance was roughly 1/10th what I’d
>  > expected, until I applied a bunch of linux tweaks.
>  >
>  > How very strange.  I knew virtualization would have overheard, 
possibly
>  > even measurable overhead, but that's insane.  Running on bare 
metal is
>  > always better if you can do it.  I would be curious what would 
happen on
>  > your original install if you applied similar tuning to that.  
Would you
>  > see a speedup there?
>  >
>  > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, 
so a
>  > much more recent release) alternate implementation of the same 
index was
>  > not seeing this high-system-time behavior on EC2, and was getting
>  > throughput consistent with our general expectations.
>  >
>  > That's even weirder.  ES 5.x will likely be using Points field 
types for
>  > numeric fields, and although those are faster than what Solr 
currently
>  > uses, I doubt it could explain that difference.  The implication 
here is
>  > that the ES systems are running with stock EC2 settings, not the 
tuned
>  > settings ... but I'd like you to confirm that.  Same Java version 
as
>  > with Solr?  IMHO, Java itself is more likely to cause issues like 
you
>  > saw than Solr.
>  >
>  > > I’m writing this for a few reasons:
>  > >
>  > > 1.   The performance difference was so crazy I really feel 
like this
>  > should really be broader knowledge.
>  >
>  > Definitely agree!  I would be very interested in learning which of 
the
>  > tunables you changed were major contributors to the improvement.  
If it
>  > turns out that Solr's code is sub-optimal in some way, maybe we 
can fix it.
>  >
>  > > 2.   If anyone is aware of anything that changed in Lucene 
between
>  > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering 
from
>  > this? If it’s the clocksource that’s the issue, there’s an 
implication that
>  > Solr was using tons more system calls like gettimeofday that the 
EC2 (xen)
>  > hypervisor doesn’t allow in userspace.
>  >
>  > I had not considered the performance regression in 6.4.0 and 6.4.1 
that
>  > Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x 
version?
>  >
>  > =
>  >
>  > Specific thoughts on the tuning:
>  >
>  > The noatime option is very good to use.  I also use nodiratime on 
my
>  > systems.  Turning these 

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
I started with the same three-node 15-shard configuration I’d been used to, in 
an RF1 cluster. (the index is almost 700G so this takes three r4.8xlarge’s if I 
want to be entirely memory-resident) I eventually dropped down to a 1/3rd size 
index on a single node (so 5 shards, 100M docs each) so I could test 
configurations more quickly. The system time usage was present on all solr 
nodes regardless. I adjusted for a difference in the CPU count on the EC2 nodes 
when I picked my load testing rates. 

Zookeeper is a separate cluster on separate nodes. It is NOT collocated with 
Solr, although it’s dedicated exclusively to Solr’s use.

I specify a timeout on all queries, and as mentioned, use SOLR-4449. So there’s 
possibly an argument I’m doing a lot more timing related calls than most. 
There’s nothing particularly exotic there though, just another Executor 
Service, and you’ll never get a backup request on an RF1 cluster because 
there’s no alternate to try. 


On 5/1/17, 6:28 PM, "Walter Underwood"  wrote:

Might want to measure the single CPU performance of your EC2 instance. The 
last time I checked, my MacBook was twice as fast as the EC2 instance I was 
using.

wunder
Walter Underwood
wun...@wunderwood.org

https://linkprotect.cudasvc.com/url?a=http://observer.wunderwood.org/=E,1,L0yDngRyy1MwN7dh5tRFW86sVcn6tcLZH4c03j0EdQSsGBMn0SLDqeB_sHQjB4DdbRMOLka5MnyeXnKS_CEUEv4qIgU5wuyhZBMHciVoH6e8uo7KGr09mXTtDw,,=0
  (my blog)


> On May 1, 2017, at 6:24 PM, Chris Hostetter  
wrote:
> 
> 
> : tldr: Recently, I tried moving an existing solrcloud configuration from 
> : a local datacenter to EC2. Performance was roughly 1/10th what I’d 
> : expected, until I applied a bunch of linux tweaks.
> 
> How many total nodes in your cluster?  How many of them running ZooKeeper?
> 
> Did you observe the heavy increase in system time CPU usage on all nodes, 
> or just the ones running zookeeper?
> 
> I ask because if your speculation is correct and it is an issue of 
> clocksource, then perhaps ZK is where the majority of those system calls 
> are happening, and perhaps that's why you didn't see any similar heavy 
> system CPU load in ES?  
> 
> (Then again: at the lowest levels "lucene" really shouldn't care about 
> anything clock related at all Any "time" realted code would live in the 
> Solr level ... hmmm.)
> 
> 
> -Hoss
> 
https://linkprotect.cudasvc.com/url?a=http://www.lucidworks.com/=E,1,ooHM-f4KYxxASNvbLSSYXKwDzWVBK-9orXh84oAZsxzfcPKZ8AF2m_U8K7wc8D5EUvaoHJCrb3O6BPCQIJucBxQaqJMOakPTxCnMW1BDHsyBf13HxMyCeEM_=0





Re: Solr performance on EC2 linux

2017-05-01 Thread Will Martin
Ubuntu 16.04 LTS - Xenial (HVM)

Is this your Xenial version?




On 5/1/2017 6:37 PM, Jeff Wartes wrote:
> I tried a few variations of various things before we found and tried that 
> linux/EC2 tuning page, including:
>- EC2 instance type: r4, c4, and i3
>- Ubuntu version: Xenial and Trusty
>- EBS vs local storage
>- Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m aware of 
> the issues with early java8 versions and I’m not using G1)
>
> Most of those attempts were to help reduce differences between the data 
> center and the EC2 cluster. In all cases I re-indexed from scratch. I got the 
> same very high system-time symptom in all cases. With the linux changes in 
> place, we settled on r4/Xenial/EBS/Stock.
>
> Again, this was a slightly modified Solr 5.4, (I added backup requests, and 
> two memory allocation rate tweaks that have long since been merged into 
> mainline - released in 6.2 I think. I can dig up the jira numbers if anyone’s 
> interested) I’ve never used Solr 6.x in production though.
> The only reason I mentioned 6.x at all is because I’m aware that ES 5.x is 
> based on Lucene 6.2. I don’t believe my coworker spent any time on tuning his 
> ES setup, although I think he did try G1.
>
> I definitely do want to binary-search those settings until I understand 
> better what exactly did the trick.
> It’s a long cycle time per test is the problem, but hopefully in the next 
> couple of weeks.
>
>
>
> On 5/1/17, 7:26 AM, "John Bickerstaff"  wrote:
>
>  It's also very important to consider the type of EC2 instance you are
>  using...
>  
>  We settled on the R4.2XL...  The R series is labeled "High-Memory"
>  
>  Which instance type did you end up using?
>  
>  On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey  wrote:
>  
>  > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
>  > > tldr: Recently, I tried moving an existing solrcloud configuration 
> from
>  > a local datacenter to EC2. Performance was roughly 1/10th what I’d
>  > expected, until I applied a bunch of linux tweaks.
>  >
>  > How very strange.  I knew virtualization would have overheard, possibly
>  > even measurable overhead, but that's insane.  Running on bare metal is
>  > always better if you can do it.  I would be curious what would happen 
> on
>  > your original install if you applied similar tuning to that.  Would you
>  > see a speedup there?
>  >
>  > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a
>  > much more recent release) alternate implementation of the same index 
> was
>  > not seeing this high-system-time behavior on EC2, and was getting
>  > throughput consistent with our general expectations.
>  >
>  > That's even weirder.  ES 5.x will likely be using Points field types 
> for
>  > numeric fields, and although those are faster than what Solr currently
>  > uses, I doubt it could explain that difference.  The implication here 
> is
>  > that the ES systems are running with stock EC2 settings, not the tuned
>  > settings ... but I'd like you to confirm that.  Same Java version as
>  > with Solr?  IMHO, Java itself is more likely to cause issues like you
>  > saw than Solr.
>  >
>  > > I’m writing this for a few reasons:
>  > >
>  > > 1.   The performance difference was so crazy I really feel like 
> this
>  > should really be broader knowledge.
>  >
>  > Definitely agree!  I would be very interested in learning which of the
>  > tunables you changed were major contributors to the improvement.  If it
>  > turns out that Solr's code is sub-optimal in some way, maybe we can 
> fix it.
>  >
>  > > 2.   If anyone is aware of anything that changed in Lucene 
> between
>  > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from
>  > this? If it’s the clocksource that’s the issue, there’s an implication 
> that
>  > Solr was using tons more system calls like gettimeofday that the EC2 
> (xen)
>  > hypervisor doesn’t allow in userspace.
>  >
>  > I had not considered the performance regression in 6.4.0 and 6.4.1 that
>  > Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x 
> version?
>  >
>  > =
>  >
>  > Specific thoughts on the tuning:
>  >
>  > The noatime option is very good to use.  I also use nodiratime on my
>  > systems.  Turning these off can have *massive* impacts on disk
>  > performance.  If these are the source of the speedup, then the machine
>  > doesn't have enough spare memory.
>  >
>  > I'd be wary of the "nobarrier" mount option.  If the underlying storage
>  > has battery-backed write caches, or is SSD without write caching, it
>  > wouldn't be a problem.  Here's info about the "discard" mount option, I
>  > 

Re: CDCR & firewall holes

2017-05-01 Thread Susheel Kumar
I believe you need to open
a) ports from source cluster  to target zookeepers (usually 2181 unless you
change it)
b) ports from source to target solr ports (usually 8983 unless you change
it)

Thanks,
Susheel

On Mon, May 1, 2017 at 2:17 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> We are considering using Cross Data Center Replication between SolrClouds
> in different domains which have a firewall between them. Is it documented
> anywhere how many firewall holes will be needed? From each source SolrCloud
> node to each target SolrCloud node? From each target SolrCloud node to each
> source SolrCloud node? From each source SolrCloud node to each target
> Zookeeper node? Do the target SolrCloud nodes ever need to talk to the
> source Zookeeper nodes (or vice versa)? Is there a need for communication
> between the two Zookeeper clusters?
>
> Thanks
>
>


Re: Both main and replica are trying to access solr_gc.log.0.current file

2017-05-01 Thread Zheng Lin Edwin Yeo
Is this the correct way to start both of the replicas?

bin\solr.cmd start -cloud -p 8983 -s solr\node1\solr -m 8g -z
"localhost:9981,localhost:9982,localhost:9983"

bin\solr.cmd start -cloud -p 8984 -s solr\node2\solr -m 8g -z
"localhost:9981,localhost:9982,localhost:9983"

Regards,
Edwin


On 30 April 2017 at 17:35, Zheng Lin Edwin Yeo  wrote:

> I'm starting Solr with this command:
>
> bin\solr.cmd start -cloud -p 8983 -s solr\node1\solr -m 8g -z
> "localhost:9981,localhost:9982,localhost:9983"
>
> bin\solr.cmd start -cloud -p 8984 -s solr\node2\solr -m 8g -z
> "localhost:9981,localhost:9982,localhost:9983"
>
> Regards,
> Edwin
>
> On 30 April 2017 at 13:52, Mike Drob  wrote:
>
>> It might depend some on how you are starting Solr (I am less familiar with
>> Windows) but you will need to give each instead a separate
>> log4j.properties
>> file and configure the log location in there.
>>
>> Also check out the Solr Ref Guide section on Configuring Logging,
>> subsection Permanent Logging Settings.
>>
>> https://cwiki.apache.org/confluence/display/solr/Configuring+Logging
>>
>> Mike
>>
>> On Sat, Apr 29, 2017, 12:24 PM Zheng Lin Edwin Yeo 
>> wrote:
>>
>> > Yes, both Solr instances are running in the same hardware.
>> >
>> > I believe they are pointing to the same log directories/config too.
>> >
>> > How do we point them to different log directories/config?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 30 April 2017 at 00:36, Mike Drob  wrote:
>> >
>> > > Are you running both Solr instances in the same hardware and pointing
>> > them
>> > > at the same log directories/config?
>> > >
>> > > On Sat, Apr 29, 2017, 2:56 AM Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm using Solr 6.4.2 on SolrCloud, and I'm running 2 replica of
>> Solr.
>> > > >
>> > > > When I start the replica, I will encounter this error message. It is
>> > > > probably due to the Solr log, as both the main and the replica are
>> > trying
>> > > > to access the same solr_gc.log.0.current file.
>> > > >
>> > > > Is there anyway to prevent this?
>> > > >
>> > > > Besides this error message, the rest of the Solr for both main and
>> > > replica
>> > > > are running normally.
>> > > >
>> > > > Exception in thread "main" java.nio.file.FileSystemException:
>> > > > C:\edwin\solr\server\logs\solr_gc.log.0.current ->
>> > > > C:\edwin\solr\server\logs\archived\solr_gc.log.0.current: The
>> process
>> > > >  cannot access the file because it is being used by another process.
>> > > >
>> > > > at
>> > > > sun.nio.fs.WindowsException.translateToIOException(WindowsEx
>> ception.j
>> > > > ava:86)
>> > > > at
>> > > > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsExce
>> ption.jav
>> > > > a:97)
>> > > > at sun.nio.fs.WindowsFileCopy.mov
>> e(WindowsFileCopy.java:387)
>> > > > at
>> > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemP
>> rovider.j
>> > > > ava:287)
>> > > > at java.nio.file.Files.move(Files.java:1395)
>> > > > at
>> > > > org.apache.solr.util.SolrCLI$UtilsTool.archiveGcLogs(SolrCLI
>> .java:357
>> > > > 9)
>> > > > at
>> > > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3548)
>> > > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
>> > > > "Failed archiving old GC logs"
>> > > > Exception in thread "main" java.nio.file.FileSystemException:
>> > > > C:\edwin\solr\server\logs\solr-8983-console.log ->
>> > > > C:\edwin\solr\server\logs\archived\solr-8983-console.log: The
>> process
>> > > >  cannot access the file because it is being used by another process.
>> > > >
>> > > > at
>> > > > sun.nio.fs.WindowsException.translateToIOException(WindowsEx
>> ception.j
>> > > > ava:86)
>> > > > at
>> > > > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsExce
>> ption.jav
>> > > > a:97)
>> > > > at sun.nio.fs.WindowsFileCopy.mov
>> e(WindowsFileCopy.java:387)
>> > > > at
>> > > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemP
>> rovider.j
>> > > > ava:287)
>> > > > at java.nio.file.Files.move(Files.java:1395)
>> > > > at
>> > > > org.apache.solr.util.SolrCLI$UtilsTool.archiveConsoleLogs(So
>> lrCLI.jav
>> > > > a:3608)
>> > > > at
>> > > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3551)
>> > > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
>> > > > "Failed archiving old console logs"
>> > > > Exception in thread "main" java.nio.file.FileSystemException:
>> > > > C:\edwin\solr\server\logs\solr.log -> C:\edwin\solr\server\logs\
>> > > solr.log.1:
>> > > > The process cannot access the file because i
>> > > > t is being used by another process.
>> > > >
>> > > > at
>> > > > sun.nio.fs.WindowsException.translateToIOException(WindowsEx
>> ception.j
>> > > > ava:86)
>> > > > at
>> 

Joining more than 2 collections

2017-05-01 Thread Zheng Lin Edwin Yeo
Hi,

Is it possible to join more than 2 collections using one of the streaming
expressions (Eg: innerJoin)? If not, is there other ways we can do it?

Currently, I may need to join 3 or 4 collections together, and to output
selected fields from all these collections together.

I'm using Solr 6.4.2.

Regards,
Edwin


Re: Solr performance on EC2 linux

2017-05-01 Thread Walter Underwood
Might want to measure the single CPU performance of your EC2 instance. The last 
time I checked, my MacBook was twice as fast as the EC2 instance I was using.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 1, 2017, at 6:24 PM, Chris Hostetter  wrote:
> 
> 
> : tldr: Recently, I tried moving an existing solrcloud configuration from 
> : a local datacenter to EC2. Performance was roughly 1/10th what I’d 
> : expected, until I applied a bunch of linux tweaks.
> 
> How many total nodes in your cluster?  How many of them running ZooKeeper?
> 
> Did you observe the heavy increase in system time CPU usage on all nodes, 
> or just the ones running zookeeper?
> 
> I ask because if your speculation is correct and it is an issue of 
> clocksource, then perhaps ZK is where the majority of those system calls 
> are happening, and perhaps that's why you didn't see any similar heavy 
> system CPU load in ES?  
> 
> (Then again: at the lowest levels "lucene" really shouldn't care about 
> anything clock related at all Any "time" realted code would live in the 
> Solr level ... hmmm.)
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Solr performance on EC2 linux

2017-05-01 Thread Chris Hostetter

: tldr: Recently, I tried moving an existing solrcloud configuration from 
: a local datacenter to EC2. Performance was roughly 1/10th what I’d 
: expected, until I applied a bunch of linux tweaks.

How many total nodes in your cluster?  How many of them running ZooKeeper?

Did you observe the heavy increase in system time CPU usage on all nodes, 
or just the ones running zookeeper?

I ask because if your speculation is correct and it is an issue of 
clocksource, then perhaps ZK is where the majority of those system calls 
are happening, and perhaps that's why you didn't see any similar heavy 
system CPU load in ES?  

(Then again: at the lowest levels "lucene" really shouldn't care about 
anything clock related at all Any "time" realted code would live in the 
Solr level ... hmmm.)


-Hoss
http://www.lucidworks.com/

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
I tried a few variations of various things before we found and tried that 
linux/EC2 tuning page, including:
  - EC2 instance type: r4, c4, and i3
  - Ubuntu version: Xenial and Trusty
  - EBS vs local storage
  - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m aware of 
the issues with early java8 versions and I’m not using G1)

Most of those attempts were to help reduce differences between the data center 
and the EC2 cluster. In all cases I re-indexed from scratch. I got the same 
very high system-time symptom in all cases. With the linux changes in place, we 
settled on r4/Xenial/EBS/Stock.

Again, this was a slightly modified Solr 5.4, (I added backup requests, and two 
memory allocation rate tweaks that have long since been merged into mainline - 
released in 6.2 I think. I can dig up the jira numbers if anyone’s interested) 
I’ve never used Solr 6.x in production though. 
The only reason I mentioned 6.x at all is because I’m aware that ES 5.x is 
based on Lucene 6.2. I don’t believe my coworker spent any time on tuning his 
ES setup, although I think he did try G1.

I definitely do want to binary-search those settings until I understand better 
what exactly did the trick. 
It’s a long cycle time per test is the problem, but hopefully in the next 
couple of weeks.



On 5/1/17, 7:26 AM, "John Bickerstaff"  wrote:

It's also very important to consider the type of EC2 instance you are
using...

We settled on the R4.2XL...  The R series is labeled "High-Memory"

Which instance type did you end up using?

On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey  wrote:

> On 4/28/2017 10:09 AM, Jeff Wartes wrote:
> > tldr: Recently, I tried moving an existing solrcloud configuration from
> a local datacenter to EC2. Performance was roughly 1/10th what I’d
> expected, until I applied a bunch of linux tweaks.
>
> How very strange.  I knew virtualization would have overheard, possibly
> even measurable overhead, but that's insane.  Running on bare metal is
> always better if you can do it.  I would be curious what would happen on
> your original install if you applied similar tuning to that.  Would you
> see a speedup there?
>
> > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a
> much more recent release) alternate implementation of the same index was
> not seeing this high-system-time behavior on EC2, and was getting
> throughput consistent with our general expectations.
>
> That's even weirder.  ES 5.x will likely be using Points field types for
> numeric fields, and although those are faster than what Solr currently
> uses, I doubt it could explain that difference.  The implication here is
> that the ES systems are running with stock EC2 settings, not the tuned
> settings ... but I'd like you to confirm that.  Same Java version as
> with Solr?  IMHO, Java itself is more likely to cause issues like you
> saw than Solr.
>
> > I’m writing this for a few reasons:
> >
> > 1.   The performance difference was so crazy I really feel like this
> should really be broader knowledge.
>
> Definitely agree!  I would be very interested in learning which of the
> tunables you changed were major contributors to the improvement.  If it
> turns out that Solr's code is sub-optimal in some way, maybe we can fix 
it.
>
> > 2.   If anyone is aware of anything that changed in Lucene between
> 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from
> this? If it’s the clocksource that’s the issue, there’s an implication 
that
> Solr was using tons more system calls like gettimeofday that the EC2 (xen)
> hypervisor doesn’t allow in userspace.
>
> I had not considered the performance regression in 6.4.0 and 6.4.1 that
> Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x 
version?
>
> =
>
> Specific thoughts on the tuning:
>
> The noatime option is very good to use.  I also use nodiratime on my
> systems.  Turning these off can have *massive* impacts on disk
> performance.  If these are the source of the speedup, then the machine
> doesn't have enough spare memory.
>
> I'd be wary of the "nobarrier" mount option.  If the underlying storage
> has battery-backed write caches, or is SSD without write caching, it
> wouldn't be a problem.  Here's info about the "discard" mount option, I
> don't know whether it applies to your amazon storage:
>
>discard/nodiscard
>   Controls  whether ext4 should issue discard/TRIM commands
> to the
>   underlying block device when blocks are freed.  This  is
> useful
>   for  SSD  devices  and sparse/thinly-provisioned LUNs, but
> it is
>   

Re: Building Solr greater than 6.2.1

2017-05-01 Thread Alexandre Rafalovitch
There was a Java compiler bug I think that was introduced and then fixed.
Took me two days to figure out when I hit that a while ago.

Regards,
Alex

On 1 May 2017 1:00 PM, "Ryan Yacyshyn"  wrote:

I was using Java 8 all along but more specifically, it was 1.8.0_25 (full
details below).

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

I initially didn't think it was my Java version so I just cleared my ivy
cache and tried building again but it failed. Only after updating to:

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

did it work.

Regards,
Ryan



On Mon, 1 May 2017 at 20:44 Shawn Heisey  wrote:

> On 5/1/2017 6:34 AM, Ryan Yacyshyn wrote:
> > Thanks Alex, it's working now. I had to update Java.
>
> What version were you using?  Lucene/Solr 6 requires Java 8.  I don't
> think that building 6.2.1 would have been successful if it weren't Java 8.
>
> I'm not familiar with any specific Java release requirements (more
> specific than version 8) for any 6.x version.
>
> Thanks,
> Shawn
>
>


CDCR & firewall holes

2017-05-01 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
We are considering using Cross Data Center Replication between SolrClouds in 
different domains which have a firewall between them. Is it documented anywhere 
how many firewall holes will be needed? From each source SolrCloud node to each 
target SolrCloud node? From each target SolrCloud node to each source SolrCloud 
node? From each source SolrCloud node to each target Zookeeper node? Do the 
target SolrCloud nodes ever need to talk to the source Zookeeper nodes (or vice 
versa)? Is there a need for communication between the two Zookeeper clusters?

Thanks



choosing placement upon RESTORE

2017-05-01 Thread xavier jmlucjav
hi,

I am facing this situation:
- I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's just
for dev work)
- the collections where created with:
   action=CREATE&...=EMPTY"
then
  action=ADDREPLICA&...=$NODEA=$DATADIR"
- I have taken a BACKUP of the collections
- Solr is upgraded to 6.5.1

Now, I started using RESTORE to restore the collections on the node A
(where they lived before), but, instead of all being created in node A,
collections have been created in A, then B, then C nodes. Well, Solrcloud
tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
disk, not reachable from nodes B and C.

How is this supposed to work? I am looking at Rule Based Placement but it
seems it is only available for CREATESHARD, so I can use it in RESTORE?
Isn't there a way to force Solrcloud to create the collection in a given
node?

thanks!


Re: Step By Step guide to create Solr Cloud in Solr 6.x

2017-05-01 Thread Erick Erickson
First, you should not have to restart Solr. Second, generally Solr
will distribute replicas fairly evenly, just use the Collections API,
CREATE command and optionally supply a "nodeSet" parameter.

If you really require exact placement of replicas on exact machines
(which I contend you probably do not), use the special nodeset EMPTY
which will create _no_ replicas. Then use ADDREPLICA with the "node"
parameter to place replicas on exact machines.

DO NOT USE the core admin API unless you have really unusual needs.
That API is low-level and you have to get everything exactly right.

Best,
Erick

On Sun, Apr 30, 2017 at 8:28 PM, Nilesh Kamani  wrote:
> UPDATE -
>
> After restarting the server, I can see that issue has been resolved for now.
>
>
> On Sun, Apr 30, 2017 at 11:12 PM, Nilesh Kamani 
> wrote:
>
>> UPDATE -
>>
>> Able to get shard1 on server and shard2 on server 2 and core on server 1
>> in the cluster.
>>
>> How can I add another node/core to cluster which is on server 2.
>>
>>
>>
>>
>> On Sun, Apr 30, 2017 at 9:48 PM, Nilesh Kamani 
>> wrote:
>>
>>> Hello All,
>>>
>>> Sorry to bother you all again. I am having hard time understanding solr
>>> terminologies.
>>>
>>> Is there any step by step guide to create solr cloud in Solr 6.x ?
>>>
>>> I have two servers on my google cloud and have installed solr on both of
>>> them.
>>>
>>> I would like to create one collection, shard1 on server1, shard2 on
>>> server2, (replicas).
>>>
>>> I want to index few GBs of documents on Shard1/Server1 and few GBs
>>> documents on Shard2/Server1.
>>>
>>> Could you please point me to a link or video ?
>>>
>>> Thanks,
>>> Nilesh Kamani
>>>
>>>
>>>
>>>
>>


Re: Slow indexing speed when collection size is large

2017-05-01 Thread Zheng Lin Edwin Yeo
Hi Rick,

I'm using Solrj for the indexing, not using curl.
Normally I bundle about 1000 documents for each POST.
There's more than 300GB of RAM for that server, and I do not use any
sharing at the moment.

Regards,
Edwin


On 1 May 2017 at 19:08, Rick Leir  wrote:

> Zheng,
> Are you POSTing using curl? Get several processes working in parallel to
> get a small boost. Solrj should speed you up a bit too (numbers anyone?).
> How many documents do you bundle in a POST?
>
> Do you have lots of RAM? Sharding?
> Cheers -- Rick
>
> On April 30, 2017 10:39:29 PM EDT, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> >Hi,
> >
> >I'm using Solr 6.4.2.
> >
> >Would like to check, if there are alot of collections in my Solr which
> >has
> >very large index size, will the indexing speed be affected?
> >
> >Currently, I have created a new collections in Solr which has several
> >collections with very large index size, and the indexing speed is much
> >slower than expected.
> >
> >Regards,
> >Edwin
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

2017-05-01 Thread Shawn Heisey
On 5/1/2017 9:19 AM, Andy C wrote:
> Your state that the best performing query that gives the desired results is:
>> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
> Is this because there some sort of optimization invoked when you use [* TO
> *], or just because a single range will be more efficient than multiple
> ranges ORed together?

There are fewer query clauses, so it takes less time.  The "all values"
range *might* perform faster than a range with a specific endpoint,
although I'm not familiar enough with the code to say for sure.

> I was considering generating an additional field "ctindex_populated" that
> would contain true or false depending on whether a ctindex value is
> present. And then changing the filter query to:
>
> fq=ctindex_populated:false OR ctindex:myId
>
> Would this be more efficient than your proposed filter query?

Yes.  Probably a lot more efficient.  Boolean fields only have two
possible values, so queries on those fields tend to be extremely fast.

Thanks,
Shawn



Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

2017-05-01 Thread Andy C
Thanks for the response Shawn.

Adding "*:*" in front of my filter query does indeed resolve the issue. It
seems odd to me that the fully negated query does work if I don't set
q.op=AND. I guess this must be "adding complexity". Actually I just
discovered that that simply removing the extraneous outer parenthesis
[ fq=-ctindex:({*
TO "MyId"} OR {"MyId" TO *}) ] also resolved the issue.

Your state that the best performing query that gives the desired results is:

> fq=ctindex:myId OR (*:* -ctindex:[* TO *])

Is this because there some sort of optimization invoked when you use [* TO
*], or just because a single range will be more efficient than multiple
ranges ORed together?

I was considering generating an additional field "ctindex_populated" that
would contain true or false depending on whether a ctindex value is
present. And then changing the filter query to:

fq=ctindex_populated:false OR ctindex:myId

Would this be more efficient than your proposed filter query?

Thanks again,
- Andy -

On Mon, May 1, 2017 at 10:19 AM, Shawn Heisey  wrote:

> On 4/26/2017 1:04 PM, Andy C wrote:
> > I'm looking at upgrading the version of Solr used with our application
> from
> > 5.3 to 6.5.
> >
> > Having an issue with a change in the behavior of one of the filter
> queries
> > we generate.
> >
> > The field "ctindex" is only present in a subset of documents. It
> basically
> > contains a user id. For those documents where it is present, I only want
> > documents returned where the ctindex value matches the id of the user
> > performing the search. Documents with no ctindex value should be returned
> > as well.
> >
> > This is implemented through a filter query that excludes documents that
> > contain some other value in the ctindex field: fq=(-ctindex:({* TO
> "MyId"}
> > OR {"MyId" TO *}))
>
> I am surprised that this works in 5.3.  The crux of the problem is that
> fully negative query clauses do not actually work.
>
> Here's the best-performing query that gives you the results you want:
>
> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
>
> The *:* is needed in the second clause to give the query a starting
> point of all documents, from which is subtracted all documents where
> ctindex has a value.  Without the "all docs" starting point, you are
> subtracting from nothing, which yields nothing.
>
> You may notice that this query works perfectly, and wonder why:
>
> fq=-ctindex:[* TO *]
>
> This works because on such a simple query, Solr is able to detect that
> it is fully negated, so it implicitly adds the *:* starting point for
> you.  As soon as you implement any kind of complexity (multiple clauses,
> parentheses, etc) that detection doesn't work.
>
> Thanks,
> Shawn
>
>


Re: Clean checkbox on DIH

2017-05-01 Thread Shawn Heisey
On 4/28/2017 9:01 AM, Mahmoud Almokadem wrote:
> We already using a shell scripts to do our import and using fullimport
> command to do our delta import and everything is doing well several
> years ago. But default of the UI is full import with clean and commit.
> If I press the Execute button by mistake the whole index is cleaned
> without any notification.

I understand your frustration.  What I'm worried about is the fallout if
we change the default to be unchecked, from people who didn't verify the
setting and expected full-import to wipe their index before it started
importing, just like it has always done for the last few years.

The default value for the clean parameter when NOT using the admin UI is
true for full-import, and false for delta-import.  That's not going to
change.  I firmly believe that the admin UI should have the same
defaults as the API itself.  The very nature of a full-import carries
the implication that you want to start over with an empty index.

What if there were some bright red text in the UI near the execute
button that urged you to double-check that the "clean" box has the
setting you want?  An alternate idea would be to pop up a yes/no
verification dialog on execute when the clean box is checked.

Thanks,
Shawn



Re: Solr performance on EC2 linux

2017-05-01 Thread John Bickerstaff
It's also very important to consider the type of EC2 instance you are
using...

We settled on the R4.2XL...  The R series is labeled "High-Memory"

Which instance type did you end up using?

On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey  wrote:

> On 4/28/2017 10:09 AM, Jeff Wartes wrote:
> > tldr: Recently, I tried moving an existing solrcloud configuration from
> a local datacenter to EC2. Performance was roughly 1/10th what I’d
> expected, until I applied a bunch of linux tweaks.
>
> How very strange.  I knew virtualization would have overheard, possibly
> even measurable overhead, but that's insane.  Running on bare metal is
> always better if you can do it.  I would be curious what would happen on
> your original install if you applied similar tuning to that.  Would you
> see a speedup there?
>
> > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a
> much more recent release) alternate implementation of the same index was
> not seeing this high-system-time behavior on EC2, and was getting
> throughput consistent with our general expectations.
>
> That's even weirder.  ES 5.x will likely be using Points field types for
> numeric fields, and although those are faster than what Solr currently
> uses, I doubt it could explain that difference.  The implication here is
> that the ES systems are running with stock EC2 settings, not the tuned
> settings ... but I'd like you to confirm that.  Same Java version as
> with Solr?  IMHO, Java itself is more likely to cause issues like you
> saw than Solr.
>
> > I’m writing this for a few reasons:
> >
> > 1.   The performance difference was so crazy I really feel like this
> should really be broader knowledge.
>
> Definitely agree!  I would be very interested in learning which of the
> tunables you changed were major contributors to the improvement.  If it
> turns out that Solr's code is sub-optimal in some way, maybe we can fix it.
>
> > 2.   If anyone is aware of anything that changed in Lucene between
> 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from
> this? If it’s the clocksource that’s the issue, there’s an implication that
> Solr was using tons more system calls like gettimeofday that the EC2 (xen)
> hypervisor doesn’t allow in userspace.
>
> I had not considered the performance regression in 6.4.0 and 6.4.1 that
> Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x version?
>
> =
>
> Specific thoughts on the tuning:
>
> The noatime option is very good to use.  I also use nodiratime on my
> systems.  Turning these off can have *massive* impacts on disk
> performance.  If these are the source of the speedup, then the machine
> doesn't have enough spare memory.
>
> I'd be wary of the "nobarrier" mount option.  If the underlying storage
> has battery-backed write caches, or is SSD without write caching, it
> wouldn't be a problem.  Here's info about the "discard" mount option, I
> don't know whether it applies to your amazon storage:
>
>discard/nodiscard
>   Controls  whether ext4 should issue discard/TRIM commands
> to the
>   underlying block device when blocks are freed.  This  is
> useful
>   for  SSD  devices  and sparse/thinly-provisioned LUNs, but
> it is
>   off by default until sufficient testing has been done.
>
> The network tunables would have more of an effect in a distributed
> environment like EC2 than they would on a LAN.
>
> Thanks,
> Shawn
>
>


Re: Solr performance on EC2 linux

2017-05-01 Thread Shawn Heisey
On 4/28/2017 10:09 AM, Jeff Wartes wrote:
> tldr: Recently, I tried moving an existing solrcloud configuration from a 
> local datacenter to EC2. Performance was roughly 1/10th what I’d expected, 
> until I applied a bunch of linux tweaks.

How very strange.  I knew virtualization would have overheard, possibly
even measurable overhead, but that's insane.  Running on bare metal is
always better if you can do it.  I would be curious what would happen on
your original install if you applied similar tuning to that.  Would you
see a speedup there?

> Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a much 
> more recent release) alternate implementation of the same index was not 
> seeing this high-system-time behavior on EC2, and was getting throughput 
> consistent with our general expectations.

That's even weirder.  ES 5.x will likely be using Points field types for
numeric fields, and although those are faster than what Solr currently
uses, I doubt it could explain that difference.  The implication here is
that the ES systems are running with stock EC2 settings, not the tuned
settings ... but I'd like you to confirm that.  Same Java version as
with Solr?  IMHO, Java itself is more likely to cause issues like you
saw than Solr.

> I’m writing this for a few reasons:
>
> 1.   The performance difference was so crazy I really feel like this 
> should really be broader knowledge.

Definitely agree!  I would be very interested in learning which of the
tunables you changed were major contributors to the improvement.  If it
turns out that Solr's code is sub-optimal in some way, maybe we can fix it.

> 2.   If anyone is aware of anything that changed in Lucene between 5.4 
> and 6.x that could explain why Elasticsearch wasn’t suffering from this? If 
> it’s the clocksource that’s the issue, there’s an implication that Solr was 
> using tons more system calls like gettimeofday that the EC2 (xen) hypervisor 
> doesn’t allow in userspace.

I had not considered the performance regression in 6.4.0 and 6.4.1 that
Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x version?

=

Specific thoughts on the tuning:

The noatime option is very good to use.  I also use nodiratime on my
systems.  Turning these off can have *massive* impacts on disk
performance.  If these are the source of the speedup, then the machine
doesn't have enough spare memory.

I'd be wary of the "nobarrier" mount option.  If the underlying storage
has battery-backed write caches, or is SSD without write caching, it
wouldn't be a problem.  Here's info about the "discard" mount option, I
don't know whether it applies to your amazon storage:

   discard/nodiscard
  Controls  whether ext4 should issue discard/TRIM commands
to the
  underlying block device when blocks are freed.  This  is 
useful
  for  SSD  devices  and sparse/thinly-provisioned LUNs, but
it is
  off by default until sufficient testing has been done.

The network tunables would have more of an effect in a distributed
environment like EC2 than they would on a LAN.

Thanks,
Shawn



Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

2017-05-01 Thread Shawn Heisey
On 4/26/2017 1:04 PM, Andy C wrote:
> I'm looking at upgrading the version of Solr used with our application from
> 5.3 to 6.5.
>
> Having an issue with a change in the behavior of one of the filter queries
> we generate.
>
> The field "ctindex" is only present in a subset of documents. It basically
> contains a user id. For those documents where it is present, I only want
> documents returned where the ctindex value matches the id of the user
> performing the search. Documents with no ctindex value should be returned
> as well.
>
> This is implemented through a filter query that excludes documents that
> contain some other value in the ctindex field: fq=(-ctindex:({* TO "MyId"}
> OR {"MyId" TO *}))

I am surprised that this works in 5.3.  The crux of the problem is that
fully negative query clauses do not actually work.

Here's the best-performing query that gives you the results you want:

fq=ctindex:myId OR (*:* -ctindex:[* TO *])

The *:* is needed in the second clause to give the query a starting
point of all documents, from which is subtracted all documents where
ctindex has a value.  Without the "all docs" starting point, you are
subtracting from nothing, which yields nothing.

You may notice that this query works perfectly, and wonder why:

fq=-ctindex:[* TO *]

This works because on such a simple query, Solr is able to detect that
it is fully negated, so it implicitly adds the *:* starting point for
you.  As soon as you implement any kind of complexity (multiple clauses,
parentheses, etc) that detection doesn't work.

Thanks,
Shawn



Re: recommended zookeeper version for solr cloud

2017-05-01 Thread Shawn Heisey
On 4/26/2017 3:44 AM, David Michael Gang wrote:
> Which version of external zookeper is recommended to use in production
> environments? 3.4.6 which is the version shipped with solr or 3.4.10
> which is the latest stable?

If it were me, I would use the latest.  The list of bugs fixed in each
ZK version after 3.4.6 is quite long.

In 3.4.10, the use of certain 4lw (four letter words) is disabled by
default by ZOOKEEPER-2693.  I do not know whether Solr uses any of the
disabled 4lw commands, but because Solr includes native ZK client
access, I would assume that it does not.  If you find that Solr doesn't
work correctly, you can try the"4lw.commands.whitelist=*" config option
to re-enable them and restart zookeeper.

http://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html
 
Thanks,
Shawn



Re: Troubleshooting solr errors

2017-05-01 Thread Shawn Heisey
On 4/25/2017 12:05 PM, Daniel Miller wrote:
> The problem isn't a particular email message - I get a cascade of
> those errors (every time a new message is received) once the server
> "breaks".  The fix is to restart the server.  I did find a Java heap
> error in the log - so I've increased the memory allocation (now to
> -Xms512m -Xmx2048m).  I had thought that a heap failure would result
> in "simple" termination - and that systemd would restart it
> appropriately - but obviously I'm missing something.

Erick covered some of this already:

The init script that the service installer script installs on a
non-windows system can start Solr, but it will not automatically restart
it if it dies.  That would require you to write something special,
probably a very custom systemd service specification, rather than use
the init script.  Automatically restarting on death is not a good idea
-- it is VERY likely that whatever caused the death is going to happen
again.

Another detail, at least on non-windows systems, is that recent Solr
versions include a script that kills the process on OutOfMemoryError
(OOME).  This is done because program operation is completely
unpredictable after that error occurs -- we have no way of knowing what
Solr will do.  There's an issue in Jira to add OOME killing to the
Windows script.

FYI, the stacktrace from an OutOfMemoryError regarding the heap is
highly unlikely to give you anything useful about why the process ran
out of memory, since *any* memory allocation in any software running in
the JVM can trigger the error.

Other errors besides OOME should never terminate Solr unless there's an
enormous bug somewhere.  That bug might be in Java itself, or even the OS.

Thanks,
Shawn



Re: Building Solr greater than 6.2.1

2017-05-01 Thread Ryan Yacyshyn
I was using Java 8 all along but more specifically, it was 1.8.0_25 (full
details below).

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

I initially didn't think it was my Java version so I just cleared my ivy
cache and tried building again but it failed. Only after updating to:

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

did it work.

Regards,
Ryan



On Mon, 1 May 2017 at 20:44 Shawn Heisey  wrote:

> On 5/1/2017 6:34 AM, Ryan Yacyshyn wrote:
> > Thanks Alex, it's working now. I had to update Java.
>
> What version were you using?  Lucene/Solr 6 requires Java 8.  I don't
> think that building 6.2.1 would have been successful if it weren't Java 8.
>
> I'm not familiar with any specific Java release requirements (more
> specific than version 8) for any 6.x version.
>
> Thanks,
> Shawn
>
>


Re: Is it expected for Synonyms to work vice-versa

2017-05-01 Thread ravi432
I also getting the same results with following scenario

Anderson window => american craftsman.

when i type Anderson window i want  solr to return results for Anderson
window and american craftsman.
but it is giving only Anderson window.

but when i type american craftsman solr is returning Anderson window and
american craftsman.

By the i am using synonyms at index time.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-expected-for-Synonyms-to-work-vice-versa-tp4332720p4332733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Building Solr greater than 6.2.1

2017-05-01 Thread Shawn Heisey
On 5/1/2017 6:34 AM, Ryan Yacyshyn wrote:
> Thanks Alex, it's working now. I had to update Java. 

What version were you using?  Lucene/Solr 6 requires Java 8.  I don't
think that building 6.2.1 would have been successful if it weren't Java 8.

I'm not familiar with any specific Java release requirements (more
specific than version 8) for any 6.x version.

Thanks,
Shawn



Re: Building Solr greater than 6.2.1

2017-05-01 Thread Ryan Yacyshyn
Thanks Alex, it's working now. I had to update Java.

Regards,
Ryan



On Mon, 1 May 2017 at 14:48 Alexandre Rafalovitch 
wrote:

> Make sure your Java is latest update. Seriously
>
> Also, if still failing, try blowing away your Ivy cache.
>
> Regards,
> Alex
>
> On 1 May 2017 6:34 AM, "Ryan Yacyshyn"  wrote:
>
> > Hi all,
> >
> > I'm trying to build Solr 6.5.1 but it's is failing. I'm able to
> > successfully build 6.2.1. I've tried 6.4.0, 6.4.2, and 6.5.1 but the
> build
> > fails. I'm not sure what the issue could be. I'm running `ant server` in
> > the solr dir and this is where it fails:
> >
> > ivy-configure:
> > [ivy:configure] :: loading settings :: file =
> > /Users/rye/lucene-solr2/lucene/top-level-ivy-settings.xml
> >
> > resolve:
> >
> > common.init:
> >
> > compile-lucene-core:
> >
> > init:
> >
> > -clover.disable:
> >
> > -clover.load:
> >
> > -clover.classpath:
> >
> > -clover.setup:
> >
> > clover:
> >
> > compile-core:
> >
> > -clover.disable:
> >
> > -clover.load:
> >
> > -clover.classpath:
> >
> > -clover.setup:
> >
> > clover:
> >
> > common.compile-core:
> > [mkdir] Created dir:
> > /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> > [javac] Compiling 186 source files to
> > /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> > [javac]
> > /Users/rye/lucene-solr2/lucene/test-framework/src/
> > java/org/apache/lucene/util/RamUsageTester.java:164:
> > error: no suitable method found for
> > collect(Collector)
> > [javac]   .collect(Collectors.toList());
> > [javac]   ^
> > [javac] method Stream.collect(Supplier,BiConsumer > super CAP#2>,BiConsumer) is not applicable
> > [javac]   (cannot infer type-variable(s) R#1
> > [javac] (actual and formal argument lists differ in length))
> > [javac] method Stream.collect(Collector > CAP#2,A,R#2>) is not applicable
> > [javac]   (cannot infer type-variable(s) R#2,A,CAP#3,T#2
> > [javac] (argument mismatch;
> Collector>
> > cannot be converted to Collector>))
> > [javac]   where R#1,T#1,R#2,A,T#2 are type-variables:
> > [javac] R#1 extends Object declared in method
> > collect(Supplier,BiConsumer > T#1>,BiConsumer)
> > [javac] T#1 extends Object declared in interface Stream
> > [javac] R#2 extends Object declared in method
> > collect(Collector)
> > [javac] A extends Object declared in method
> > collect(Collector)
> > [javac] T#2 extends Object declared in method toList()
> > [javac]   where CAP#1,CAP#2,CAP#3,CAP#4 are fresh type-variables:
> > [javac] CAP#1 extends Object from capture of ?
> > [javac] CAP#2 extends Object from capture of ?
> > [javac] CAP#3 extends Object from capture of ?
> > [javac] CAP#4 extends Object from capture of ?
> > [javac] Note: Some input files use or override a deprecated API.
> > [javac] Note: Recompile with -Xlint:deprecation for details.
> > [javac] 1 error
> >
> > BUILD FAILED
> > /Users/rye/lucene-solr2/solr/build.xml:463: The following error occurred
> > while executing this line:
> > /Users/rye/lucene-solr2/solr/common-build.xml:476: The following error
> > occurred while executing this line:
> > /Users/rye/lucene-solr2/solr/contrib/map-reduce/build.xml:53: The
> > following
> > error occurred while executing this line:
> > /Users/rye/lucene-solr2/solr/contrib/morphlines-cell/build.xml:45: The
> > following error occurred while executing this line:
> > /Users/rye/lucene-solr2/solr/common-build.xml:443: The following error
> > occurred while executing this line:
> > /Users/rye/lucene-solr2/solr/test-framework/build.xml:35: The following
> > error occurred while executing this line:
> > /Users/rye/lucene-solr2/lucene/common-build.xml:767: The following error
> > occurred while executing this line:
> > /Users/rye/lucene-solr2/lucene/common-build.xml:501: The following error
> > occurred while executing this line:
> > /Users/rye/lucene-solr2/lucene/common-build.xml:1967: Compile failed; see
> > the compiler error output for details.
> >
> > Total time: 2 minutes 28 seconds
> >
> > Java version:
> >
> > java version "1.8.0_25"
> > Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> >
> > ant: Apache Ant(TM) version 1.10.0 compiled on December 27 2016
> > ivy: ivy-2.3.0.jar
> >
> > Any suggestions I can try?
> >
> > Regards,
> > Ryan
> >
>


Re: BooleanQuery and WordDelimiterFilter

2017-05-01 Thread Rick Leir
Avi,
Tell us the relevant field types you have in schema.xml.
You can also solve this all for yourself in the Solr Admin Analysis panel.
Cheers -- Rick

On May 1, 2017 2:34:31 AM EDT, Avi Steiner  wrote:
>Hi
>
>I have  a question regarding the use of query parser and BooleanQuery.
>
>I have 3 documents indexed.
>Doc1 contains the words huntman's and huntman
>Doc2 contains the word huntman's
>Doc3 contains the word huntman
>
>When I search for huntman's I get Doc1 and Doc2
>When I search for +huntman's I get Doc1, Doc2 and Doc3
>
>As far as I understand, when I search for huntman's it should return
>documents with both huntman and huntman's (using WordDelimiterFilter)
>I also know that plus sign means that the term must be in document and
>the absence of plus (or minus) sign means that the term may or may not
>be in document as explained here:
>https://lucidworks.com/2011/12/28/why-not-and-or-and-not/
>
>So I don't understand the combination of these two properties.
>I think I understand why +huntman's returns Doc3 as well, because it
>can be translated to +(huntman's OR huntman), which means: must be one
>of the following: huntman's or huntman.
>But I don't understand why Doc3 is not returned by huntman's as well.
>Isn't it translated to huntman's OR huntman?
>
>Thanks
>
>Avi
>
>
>
>This email and any attachments thereto may contain private,
>confidential, and privileged material for the sole use of the intended
>recipient. Any review, copying, or distribution of this email (or any
>attachments thereto) by others is strictly prohibited. If you are not
>the intended recipient, please contact the sender immediately and
>permanently delete the original and any copies of this email and any
>attachments thereto.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Slow indexing speed when collection size is large

2017-05-01 Thread Rick Leir
Zheng,
Are you POSTing using curl? Get several processes working in parallel to get a 
small boost. Solrj should speed you up a bit too (numbers anyone?). How many 
documents do you bundle in a POST? 

Do you have lots of RAM? Sharding?
Cheers -- Rick

On April 30, 2017 10:39:29 PM EDT, Zheng Lin Edwin Yeo  
wrote:
>Hi,
>
>I'm using Solr 6.4.2.
>
>Would like to check, if there are alot of collections in my Solr which
>has
>very large index size, will the indexing speed be affected?
>
>Currently, I have created a new collections in Solr which has several
>collections with very large index size, and the indexing speed is much
>slower than expected.
>
>Regards,
>Edwin

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

RE: Term no longer matches if PositionLengthAttr is set to two

2017-05-01 Thread Markus Jelsma
Hello again, apologies for cross-posting and having to get back to this 
unsolved problem.

Initially i thought this is a problem i have with, or in Lucene. Maybe not, so 
is this problem in Solr? Is here anyone who has seen this problem before?

Many thanks,
Markus

-Original message-
> From:Markus Jelsma 
> Sent: Tuesday 25th April 2017 13:40
> To: java-u...@lucene.apache.org
> Subject: Term no longer matches if PositionLengthAttr is set to two
> 
> Hello,
> 
> We have a decompounder and recently implemented the PositionLengthAttribute 
> in it and set it to 2 for a two-word compound such as drinkwater (drinking 
> water in dutch). The decompounder runs both at index- and query-time on Solr 
> 6.5.0.
> 
> The problem is, q=content_nl:drinkwater no longer returns documents 
> containing drinkwater when posLenAtt = 2 at query time.
> 
> This is Solr's debug output for drinkwater with posLenAtt = 2:
> 
> content_nl:drinkwater
> content_nl:drinkwater
> SynonymQuery(Synonym())
> Synonym()
> 
> This is the output where i reverted the decompounder, thus a posLenAtt = 1:
> 
> content_nl:drinkwater
> content_nl:drinkwater
> SynonymQuery(Synonym(content_nl:drink 
> content_nl:drinkwater)) content_nl:water
> Synonym(content_nl:drink 
> content_nl:drinkwater) content_nl:water
> 
> The indexed terms still have posLenAtt = 2, but having a posLenAtt = 2 at 
> query time seems to be a problem.
> 
> Any thoughts on this issue? Is it a bug? Do i not understand 
> PositionLengthAttribute? Why does it affect term/document matching? At query 
> time but not at index time?
> 
> Many thanks,
> Markus
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 


Is it expected for Synonyms to work vice-versa

2017-05-01 Thread Atita Arora
Hi,

I have this strange issues happening today where I specified certain
keyword to match as synonym word as :

(^|[^a-zA-Z0-9])[cC][#]([^a-zA-Z0-9]|$)=>$1csharp$2

Which essentially means anyone searching for "C#" should be matched with a
document containing "csharp" too.

Now I have ran into something mysterious (atleast its a mystery for me !!
not sure if that's the expected behaviour) that someone searching for
"sharp" is matched with docs containing "#" , where as no other
configurations specifies this elsewhere.

Is this normal / expected ?

Please guide !

TIA -
Atita


RE: pagination of results of grouping by more than one field

2017-05-01 Thread Mikhail Ibraheem
Hi,
Any clue?

Thanks


-Original Message-
From: Mikhail Ibraheem 
Sent: Sunday, April 30, 2017 10:09 AM
To: solr-user@lucene.apache.org
Subject: pagination of results of grouping by more than one field

Hi,

I have a problem that I need to group by X and Y and aggregator on Z and I need 
to paginate on the results. 

The results aren't flat they are in hierarchy so how to flat the results  so we 
can paginate on them for each combination of X,Y like:

Computers, Computer Laptops, 5.684790733920929E10

Computers, PE_Server, 1.1207993365851181E10

Computers, Monitors, 1.2723246848002455E9

Data Communications Hardware, Datacom Hardware, 6.3539691650598495E10

 

>From the sample of the results:

 

"X":{

  "buckets":[{

  "val":"Computers",

  "count":981466,

  "Y":{

"buckets":[{

"val":"Computer Laptops",

"count":391064,

"sum":5.684790733920929E10},

  {

"val":"PE_Server",

"count":218148,

"sum":1.1207993365851181E10},

  {

"val":"Monitors",

"count":122176,

"sum":1.2723246848002455E9}]}},

{

  "val":"Data Communications Hardware",

  "count":428230,

  "Y":{

"buckets":[{

"val":"Datacom Hardware",

"count":428230,

"sum":6.3539691650598495E10}]}},

{

  "val":"Leasehold Improvements",

  "count":33677,

  "Y":{

"buckets":[{

"val":"Leasehold improvements",

"count":33676,

"sum":1.6308392462957385E12},

  {

"val":"Electrical & Air Conditioning",

"count":1,

"sum":4505.0}]}},

{

 


Re: Building Solr greater than 6.2.1

2017-05-01 Thread Alexandre Rafalovitch
Make sure your Java is latest update. Seriously

Also, if still failing, try blowing away your Ivy cache.

Regards,
Alex

On 1 May 2017 6:34 AM, "Ryan Yacyshyn"  wrote:

> Hi all,
>
> I'm trying to build Solr 6.5.1 but it's is failing. I'm able to
> successfully build 6.2.1. I've tried 6.4.0, 6.4.2, and 6.5.1 but the build
> fails. I'm not sure what the issue could be. I'm running `ant server` in
> the solr dir and this is where it fails:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file =
> /Users/rye/lucene-solr2/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> common.init:
>
> compile-lucene-core:
>
> init:
>
> -clover.disable:
>
> -clover.load:
>
> -clover.classpath:
>
> -clover.setup:
>
> clover:
>
> compile-core:
>
> -clover.disable:
>
> -clover.load:
>
> -clover.classpath:
>
> -clover.setup:
>
> clover:
>
> common.compile-core:
> [mkdir] Created dir:
> /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> [javac] Compiling 186 source files to
> /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> [javac]
> /Users/rye/lucene-solr2/lucene/test-framework/src/
> java/org/apache/lucene/util/RamUsageTester.java:164:
> error: no suitable method found for
> collect(Collector)
> [javac]   .collect(Collectors.toList());
> [javac]   ^
> [javac] method Stream.collect(Supplier,BiConsumer super CAP#2>,BiConsumer) is not applicable
> [javac]   (cannot infer type-variable(s) R#1
> [javac] (actual and formal argument lists differ in length))
> [javac] method Stream.collect(Collector CAP#2,A,R#2>) is not applicable
> [javac]   (cannot infer type-variable(s) R#2,A,CAP#3,T#2
> [javac] (argument mismatch; Collector>
> cannot be converted to Collector>))
> [javac]   where R#1,T#1,R#2,A,T#2 are type-variables:
> [javac] R#1 extends Object declared in method
> collect(Supplier,BiConsumer T#1>,BiConsumer)
> [javac] T#1 extends Object declared in interface Stream
> [javac] R#2 extends Object declared in method
> collect(Collector)
> [javac] A extends Object declared in method
> collect(Collector)
> [javac] T#2 extends Object declared in method toList()
> [javac]   where CAP#1,CAP#2,CAP#3,CAP#4 are fresh type-variables:
> [javac] CAP#1 extends Object from capture of ?
> [javac] CAP#2 extends Object from capture of ?
> [javac] CAP#3 extends Object from capture of ?
> [javac] CAP#4 extends Object from capture of ?
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 1 error
>
> BUILD FAILED
> /Users/rye/lucene-solr2/solr/build.xml:463: The following error occurred
> while executing this line:
> /Users/rye/lucene-solr2/solr/common-build.xml:476: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/solr/contrib/map-reduce/build.xml:53: The
> following
> error occurred while executing this line:
> /Users/rye/lucene-solr2/solr/contrib/morphlines-cell/build.xml:45: The
> following error occurred while executing this line:
> /Users/rye/lucene-solr2/solr/common-build.xml:443: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/solr/test-framework/build.xml:35: The following
> error occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:767: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:501: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:1967: Compile failed; see
> the compiler error output for details.
>
> Total time: 2 minutes 28 seconds
>
> Java version:
>
> java version "1.8.0_25"
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
>
> ant: Apache Ant(TM) version 1.10.0 compiled on December 27 2016
> ivy: ivy-2.3.0.jar
>
> Any suggestions I can try?
>
> Regards,
> Ryan
>


BooleanQuery and WordDelimiterFilter

2017-05-01 Thread Avi Steiner
Hi

I have  a question regarding the use of query parser and BooleanQuery.

I have 3 documents indexed.
Doc1 contains the words huntman's and huntman
Doc2 contains the word huntman's
Doc3 contains the word huntman

When I search for huntman's I get Doc1 and Doc2
When I search for +huntman's I get Doc1, Doc2 and Doc3

As far as I understand, when I search for huntman's it should return documents 
with both huntman and huntman's (using WordDelimiterFilter)
I also know that plus sign means that the term must be in document and the 
absence of plus (or minus) sign means that the term may or may not be in 
document as explained here: 
https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

So I don't understand the combination of these two properties.
I think I understand why +huntman's returns Doc3 as well, because it can be 
translated to +(huntman's OR huntman), which means: must be one of the 
following: huntman's or huntman.
But I don't understand why Doc3 is not returned by huntman's as well. Isn't it 
translated to huntman's OR huntman?

Thanks

Avi



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.