Sql entity processor sortedmapbackedcache out of memory issue

2019-04-08 Thread Srinivas Kashyap
Hello,

I'm using DIH to index the data and the structure of the DIH is like below for 
solr core:


16 child entities


During indexing, since the number of requests being made to database was 
high(to process one document 17 queries) and was utilizing most of connections 
of database thereby blocking our web application.

To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
reduce the number of requests to database.












.
.
.
.
.
.
.


We have 8GB Physical memory system(RAM) with 5GB of it allocated to JVM and 
when we do full-import, only 17 requests are made to database. However, it is 
shooting up memory consumption and is making the JVM out of memory. Out of 
memory is happening depending on the number of records each entity is bringing 
in to the memory. For Dev and QA environments, the above memory config is 
sufficient. When we move to production, we have to increase the memory to 
around 16GB of RAM and 12 GB of JVM.

Is there any logic/configurations to limit the memory usage?

Thanks and Regards,
Srinivas Kashyap


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Re: solr wild card search

2019-04-08 Thread Anil
Thanks Eric. escaped colon and it worked. my bad.. i missed it :)

On Mon, 8 Apr 2019 at 21:55, Erick Erickson  wrote:

> See:
> https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
>
> > On Apr 8, 2019, at 9:04 AM, Anil  wrote:
> >
> > Hi Eric,
> >
> > url:"https://facebook.com/posts/123456"; is working.
> > url:https://facebook.com/posts * is
> not
> > working.
> >
> > i tried to escape forward slash  and dot (.).. didnt help. i missed
> colon.
> > let me try. Thanks.
> >
> > Regards,
> > Anil
> >
> > On Mon, 8 Apr 2019 at 21:02, Erick Erickson 
> wrote:
> >
> >> Show us the exact search you’re using, both the failure and success case
> >> please. Most likely you need to escape things like the colon…
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 8, 2019, at 8:19 AM, Anil  wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> Good Morning.
> >>>
> >>> I am storing url in string field. wild card search is giving following
> >>> error.
> >>>
> >>> "error":{
> >>>
> >>>   "metadata":[
> >>>
> >>> "error-class","org.apache.solr.common.SolrException",
> >>>
> >>> "root-error-class","org.apache.solr.parser.ParseException"],
> >>>
> >>>   "msg":"org.apache.solr.search.SyntaxError: Cannot parse 'url:
> >>> https://facebook.com/posts/123456': Encountered \" \":\" \": \"\" at
> >> line
> >>> 1, column 15.\nWas expecting one of:\n \n ...\n
> 
> >>> ...\n ...\n\"+\" ...\n\"-\" ...\n ...\n
> >>> \"(\" ...\n\"*\" ...\n\"^\" ...\n ...\n
> >>> ...\n ...\n ...\n ...\n
> >>>  ...\n\"[\" ...\n\"{\" ...\n ...\n
> >>> \"filter(\" ...\n ...\n",
> >>>
> >>>   "code":400}}
> >>>
> >>>
> >>> only exact match on url field is working.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Anil
> >>
> >>
>
>


Re: Solr Cache clear

2019-04-08 Thread Alexandre Rafalovitch
You may have warming queries to prepopulate your cache. Check your
solrconfig.xml.

Regards,
Alex

On Mon, Apr 8, 2019, 4:16 PM Lewin Joy (TMNA),  wrote:

> ** PROTECTED 関係者外秘
> How do I clear the solr caches without restarting Solr cluster?
> Is there a way?
> I tried reloading the collection. But, it did not help.
>
> Thanks,
> Lewin
>
>


Re: Solr Cache clear

2019-04-08 Thread Shawn Heisey

On 4/8/2019 2:14 PM, Lewin Joy (TMNA) wrote:

How do I clear the solr caches without restarting Solr cluster?
Is there a way?
I tried reloading the collection. But, it did not help.


When I reload a core on a test setup (solr 7.4.0), I see cache sizes reset.

What evidence are you seeing that reloading doesn't work?

Thanks,
Shawn


Solr Cache clear

2019-04-08 Thread Lewin Joy (TMNA)
** PROTECTED 関係者外秘
How do I clear the solr caches without restarting Solr cluster?
Is there a way?
I tried reloading the collection. But, it did not help.

Thanks,
Lewin



Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Kevin Cunningham
Hi all,

I'm sure I've done this before but this seems to be falling down a bit and I 
was wondering if anyone had any helpful ideas.
I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The 
reprocessing for this takes a long time and so we normally reindex on a 
secondary cluster and swap them out.
I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr 
cluster with 1 shard and replication factor of 3.
I want to copy over the index and have it replicate to the rest of the cluster. 
I have taken a copy of the data directory from the reprocessed core and copied 
it into the leader's data directory. This shows up correctly as having a 51GB 
index and the documents are searchable.
I have tried the following curl commands to kick off replication:
curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type: 
text/xml" --data-binary @test.xml
curl http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E

I've tried this a few times and had a few different results:
The index gets set to 0 and has the single record I commit

A timed index gets created (index.201904082111232) and index.properties then 
points to that

I had an issue with IndexWriter being closed

The index stays consistent and doesn't replicate

I've tried copying the index to both the leader and one other node to see if 
that helps but I'm faced with similar results as above.

Does anyone have any advice to how I can get this index moved and replicated 
onto this new cluster?
Thanks a lot!
Kevin.



Re: SOLR Text Field

2019-04-08 Thread Shawn Heisey

On 4/8/2019 10:27 AM, Dave Beckstrom wrote:

SOLR really should ship with a sample text field defined even if commented
out and only for example purposes only.  That would have been
most helpful.  Even a FAQ somewhere would have been helpful.


There are two example configs in the latest version of Solr (8.0.0). 
Some of the earlier versions include more than two.


In the latest download, check the solr-8.0.0/server/solr/configsets 
directory.  There will be two directories there, each of which contains 
a conf directory.


In the managed-schema file found in the conf directory, you will find 
multiple examples of text field types.  The managed-schema in the 
_default configset has the following type names that use the 
solr.TextField class:


text_ws, text_general, text_en, text_en_splitting, 
text_splitting_en_tight, text_general_rev, phonetic_en, lowercase, 
descendent_path, ancestor_path, delimited_payloads_float, 
delimited_payloads_int, delimited_payloads_string, text_ar, text_bg, 
text_ca, text_cjk, text_cz, text_da, text_de, text_el, text_es, text_eu, 
text_fa, text_fi, text_fr, text_ga, text_gl, text_hi, text_hu, text_hy, 
text_id, text_it, text_ja, text_ko, text_lv, text_nl, text_no, text_pt, 
text_ro, text_ru, text_sv, text_th, text_tr


There are also field definitions using most of the fieldType definitions 
in the example config.


Solr's example configs fall into the "kitchen sink" category.  They 
contain things that most users will NEVER need.


Thanks,
Shawn


Re: SOLR Text Field

2019-04-08 Thread Dave Beckstrom
Shawn,

I can't thank you enough for taking the time to reply to my question and
for the info you shared.

I don't believe I ever found one example by Googling of how to define a
simple text field in SOLR.  I saw some examples of Text_General but as you
saw it wasn't what I needed.

Based on the info you provided I was able to get it working where Nutch now
crawls and indexes into SOLR without issue.I rebuilt all my collections
with the proper field definitions using your example.

SOLR really should ship with a sample text field defined even if commented
out and only for example purposes only.  That would have been
most helpful.  Even a FAQ somewhere would have been helpful.

Anyway, you're the best and thank you again

Best,

Dave Beckstrom

-- 
*Fig Leaf Software, Inc.* 
https://www.figleaf.com/ 
  

Full-Service Solutions Integrator








Re: solr wild card search

2019-04-08 Thread Erick Erickson
See: https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html

> On Apr 8, 2019, at 9:04 AM, Anil  wrote:
> 
> Hi Eric,
> 
> url:"https://facebook.com/posts/123456"; is working.
> url:https://facebook.com/posts * is not
> working.
> 
> i tried to escape forward slash  and dot (.).. didnt help. i missed colon.
> let me try. Thanks.
> 
> Regards,
> Anil
> 
> On Mon, 8 Apr 2019 at 21:02, Erick Erickson  wrote:
> 
>> Show us the exact search you’re using, both the failure and success case
>> please. Most likely you need to escape things like the colon…
>> 
>> Best,
>> Erick
>> 
>>> On Apr 8, 2019, at 8:19 AM, Anil  wrote:
>>> 
>>> Hi Team,
>>> 
>>> Good Morning.
>>> 
>>> I am storing url in string field. wild card search is giving following
>>> error.
>>> 
>>> "error":{
>>> 
>>>   "metadata":[
>>> 
>>> "error-class","org.apache.solr.common.SolrException",
>>> 
>>> "root-error-class","org.apache.solr.parser.ParseException"],
>>> 
>>>   "msg":"org.apache.solr.search.SyntaxError: Cannot parse 'url:
>>> https://facebook.com/posts/123456': Encountered \" \":\" \": \"\" at
>> line
>>> 1, column 15.\nWas expecting one of:\n \n ...\n
>>> ...\n ...\n\"+\" ...\n\"-\" ...\n ...\n
>>> \"(\" ...\n\"*\" ...\n\"^\" ...\n ...\n
>>> ...\n ...\n ...\n ...\n
>>>  ...\n\"[\" ...\n\"{\" ...\n ...\n
>>> \"filter(\" ...\n ...\n",
>>> 
>>>   "code":400}}
>>> 
>>> 
>>> only exact match on url field is working.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Anil
>> 
>> 



Re: solr wild card search

2019-04-08 Thread Anil
Hi Eric,

url:"https://facebook.com/posts/123456"; is working.
url:https://facebook.com/posts * is not
working.

i tried to escape forward slash  and dot (.).. didnt help. i missed colon.
let me try. Thanks.

Regards,
Anil

On Mon, 8 Apr 2019 at 21:02, Erick Erickson  wrote:

> Show us the exact search you’re using, both the failure and success case
> please. Most likely you need to escape things like the colon…
>
> Best,
> Erick
>
> > On Apr 8, 2019, at 8:19 AM, Anil  wrote:
> >
> > Hi Team,
> >
> > Good Morning.
> >
> > I am storing url in string field. wild card search is giving following
> > error.
> >
> > "error":{
> >
> >"metadata":[
> >
> >  "error-class","org.apache.solr.common.SolrException",
> >
> >  "root-error-class","org.apache.solr.parser.ParseException"],
> >
> >"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'url:
> > https://facebook.com/posts/123456': Encountered \" \":\" \": \"\" at
> line
> > 1, column 15.\nWas expecting one of:\n \n ...\n
> > ...\n ...\n\"+\" ...\n\"-\" ...\n ...\n
> > \"(\" ...\n\"*\" ...\n\"^\" ...\n ...\n
> > ...\n ...\n ...\n ...\n
> >  ...\n\"[\" ...\n\"{\" ...\n ...\n
> > \"filter(\" ...\n ...\n",
> >
> >"code":400}}
> >
> >
> > only exact match on url field is working.
> >
> >
> > Thanks,
> >
> > Anil
>
>


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Shawn Heisey

On 4/8/2019 10:06 AM, Shawn Heisey wrote:

* Make sure you have a copy of the source index directory.
* Do not copy the tlog directory from the source.
* Create the collection in the target cloud.
* Shut down the target cloud completely.
* Delete all the index directories in the cloud.
* Copy the source index directory to one of the cloud nodes.
* Start that cloud node up.  Make sure it is all working.
* Start up the other nodes.


At the "delete all the index directories in the cloud" step, I should 
have written "delete the contents of all data directories for the 
collection in the cloud" ... everything in data should be deleted, not 
just the index directory.  Don't want it replaying transaction logs when 
Solr starts!


Thanks,
Shawn


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Shawn Heisey

On 4/8/2019 8:59 AM, kevinc wrote:

I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
cluster with 1 shard and replication factor of 3.

I want to copy over the index and have it replicate to the rest of the
cluster. I have taken a copy of the data directory from the reprocessed core
and copied it into the leader's data directory. This shows up correctly as
having a 51GB index and the documents are searchable.

I have tried the following curl commands to kick off replication:

curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
text/xml" --data-binary @test.xml
curl
http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E


I think the following is probably what you're going to want to do in 
order to transplant an existing index into a new cloud:


* Make sure you have a copy of the source index directory.
* Do not copy the tlog directory from the source.
* Create the collection in the target cloud.
* Shut down the target cloud completely.
* Delete all the index directories in the cloud.
* Copy the source index directory to one of the cloud nodes.
* Start that cloud node up.  Make sure it is all working.
* Start up the other nodes.

Once the other nodes are started, they will automatically notice that 
they don't have an index directory and will copy the index from the leader.


These instructions assume a single shard in both the source and the 
target.  If you are changing the number of shards, it will be a lot 
easier to simply reindex into the new cloud.


Erick's message indicates another way you could go ... create the new 
index with a single replica, get that working, and then use ADDREPLICA 
(part of the Collections API) to add more replicas.


Thanks,
Shawn


Re: Solr 8.0.0 - CPU usage 100% when indexed documents

2019-04-08 Thread Shawn Heisey

On 4/8/2019 7:22 AM, vishal patel wrote:
I have created two solr shards with 3 zoo keeper. First do upconfig in 
zoo keeper then start the both solr with different port then create a 
"actionscomments" collection using API call.


When I indexed one document in actionscomments, my CPU utilization go high.


You were asked how you are doing the indexing.  You still haven't 
provided that information. You said "AsiteSolrCloudManager" ... but when 
I google for that, the only thing that comes up is this email thread.  I 
have no idea what AsiteSolrCloudManager is.  One thing I *can* say is 
that it is not part of Solr.


I have attached my solrconfig.xml and schema.xml and also thread dump 
which got from solr admin GUI.


The schema and solrconfig came through.  The thread dump did not.  I'm 
surprised that ANY attachments made it to the list ... normally they 
don't.  The thread dump also did not come through on your first message. 
 If you need to share files, you'll need to find a mechanism other than 
attachments to do it.  File sharing websites work well.


So we don't have the thread dump.  But for a problem like this, a thread 
dump is not going to be helpful.  I've never seen anything in a Java 
thread dump to indicate which threads are using the most CPU.


Thanks,
Shawn


Re: solr wild card search

2019-04-08 Thread Erick Erickson
Show us the exact search you’re using, both the failure and success case 
please. Most likely you need to escape things like the colon…

Best,
Erick

> On Apr 8, 2019, at 8:19 AM, Anil  wrote:
> 
> Hi Team,
> 
> Good Morning.
> 
> I am storing url in string field. wild card search is giving following
> error.
> 
> "error":{
> 
>"metadata":[
> 
>  "error-class","org.apache.solr.common.SolrException",
> 
>  "root-error-class","org.apache.solr.parser.ParseException"],
> 
>"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'url:
> https://facebook.com/posts/123456': Encountered \" \":\" \": \"\" at line
> 1, column 15.\nWas expecting one of:\n \n ...\n
> ...\n ...\n\"+\" ...\n\"-\" ...\n ...\n
> \"(\" ...\n\"*\" ...\n\"^\" ...\n ...\n
> ...\n ...\n ...\n ...\n
>  ...\n\"[\" ...\n\"{\" ...\n ...\n
> \"filter(\" ...\n ...\n",
> 
>"code":400}}
> 
> 
> only exact match on url field is working.
> 
> 
> Thanks,
> 
> Anil



solr wild card search

2019-04-08 Thread Anil
Hi Team,

Good Morning.

I am storing url in string field. wild card search is giving following
error.

"error":{

"metadata":[

  "error-class","org.apache.solr.common.SolrException",

  "root-error-class","org.apache.solr.parser.ParseException"],

"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'url:
https://facebook.com/posts/123456': Encountered \" \":\" \": \"\" at line
1, column 15.\nWas expecting one of:\n \n ...\n
...\n ...\n\"+\" ...\n\"-\" ...\n ...\n
\"(\" ...\n\"*\" ...\n\"^\" ...\n ...\n
...\n ...\n ...\n ...\n
 ...\n\"[\" ...\n\"{\" ...\n ...\n
\"filter(\" ...\n ...\n",

"code":400}}


only exact match on url field is working.


Thanks,

Anil


Re: I it possible to configure solr to show time stamps without the 'Z'- character in the end

2019-04-08 Thread Shawn Heisey

On 4/8/2019 4:38 AM, Miettinen Jaana (STAT) wrote:

I have a problem in solr: I should add several (old) time stamps into my solr 
documents, but all of them are in  local time (UTC+2 or UTC+3 depending on 
day-light-saving situation). As default solr excepts all time stamps to be in 
UTC-time and adds the 'Z'-character into the end of the time stamp-strings to 
indicate, that the date should be considered as UTC-time.

Is it possible to change this 'Z'-notation ? Either I would want to get rid of 
that 'Z' or change it to denote UTC+2.


Solr uses UTC.  The "Z" is part of the ISO standard that Solr is using. 
I forget which ISO number it is.  So it's always going to be there when 
using a date field.



I noticed that there's variable SOLR_TIMEZONE in solr-7.6.0/bin/solr.in.sh-file. I 
changed it to  SOLR_TIMEZONE="EST", re-created my solr-servers, but nothing 
changed. Why was that configuration file ignored (I also changed the port to check 
whether it was ignored really) ? And what is the purpose of  SOLR_TIMEZONE-variable ?



The timezone information affects date math.  So when you have something 
like NOW/WEEK or NOW/DAY, Solr knows when a new day starts and can round 
the time correctly.


Timezone information does *NOT* affect the time that does into the index 
or the display of information in search results.


If you want your local timezone in your output, you're going to need to 
do what programs on UNIX have been doing for decades -- translating the 
UTC time they can access to the configured timezone.  It is rare for 
Solr's results to be given directly to users -- it nearly always passes 
through a custom program.


Thanks,
Shawn


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Erick Erickson
Here’s what I’d do:

1> Just spin up a _one_ node cluster and copy the index from your offline 
process and start Solr. I’l probably do this with Solr down.
2> Use the ADDREPLICA command to build out that cluster. The index copy 
associated with ADDREPLICA is robust. I’d wait until each replica showed green 
before adding the next one if you have any concerns about saturating your 
network, if you added the replicas all at once they you’ll have N simultaneous 
copies of the 50G index.

I’m not quite sure what’s happening in your situation, there are a lot of 
possibilities. The above should just avoid most all of the places where 
something could go wrong with your process.

Best,
Erick

> On Apr 8, 2019, at 7:59 AM, kevinc  wrote:
> 
> Hi all,
> 
> I'm sure I've done this before but this seems to be falling down a bit and I
> was wondering if anyone had any helpful ideas.
> 
> I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The
> reprocessing for this takes a long time and so we normally reindex on a
> secondary cluster and swap them out.
> 
> I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
> cluster with 1 shard and replication factor of 3.
> 
> I want to copy over the index and have it replicate to the rest of the
> cluster. I have taken a copy of the data directory from the reprocessed core
> and copied it into the leader's data directory. This shows up correctly as
> having a 51GB index and the documents are searchable.
> 
> I have tried the following curl commands to kick off replication:
> 
> curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
> text/xml" --data-binary @test.xml
> curl
> http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E
> 
> I've tried this a few times and had a few different results:
> The index gets set to 0 and has the single record I commit
> A timed index gets created (index.201904082111232) and index.properties then
> points to that
> I had an issue with IndexWriter being closed
> The index stays consistent and doesn't replicate
> I've tried copying the index to both the leader and one other node to see if
> that helps but I'm faced with similar results as above.
> 
> Does anyone have any advice to how I can get this index moved and replicated
> onto this new cluster?
> 
> Thanks a lot!
> Kevin.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: I it possible to configure solr to show time stamps without the 'Z'- character in the end

2019-04-08 Thread Erick Erickson
When you ask for a field from Solr, it returns _exactly_ what you gave it. So 
if you input contains the “Z”, the output will. You have to massage it however 
you want if you want something different. I can imagine at least 3 ways to do 
this:

1> create a second field with stored=“true”, indexed=“false”, docValues=“false” 
as a “string” type. On your real date field, make it stored=“false”. Now you 
search/group/facet/whatever on hour date field and specify tne new field in 
your “fl” list. It would be easy to do all this in a ScriptUpdateProcessor so 
your client(s) wouldn’t have to deal with it.

2> Just have your client app take the date and transform it into something more 
pleasing.

3> Use a document transformer (see “Transforming Result Documents” in the 
reference guide) to change the docs on the way out.

However, it’s a different time you’re going to show your users .vs. the actual 
time in the document. You want to take the “Z” off and/or change it to UTC+2. 
But that’s mis-informing the user about what the actual time was by 2 hours 
unless you change the value shown to reflect that.

Best,
Erick


> On Apr 8, 2019, at 3:38 AM, Miettinen Jaana (STAT)  
> wrote:
> 
> Dear recipient,
> 
> I have a problem in solr: I should add several (old) time stamps into my solr 
> documents, but all of them are in  local time (UTC+2 or UTC+3 depending on 
> day-light-saving situation). As default solr excepts all time stamps to be in 
> UTC-time and adds the 'Z'-character into the end of the time stamp-strings to 
> indicate, that the date should be considered as UTC-time.
> 
> Is it possible to change this 'Z'-notation ? Either I would want to get rid 
> of that 'Z' or change it to denote UTC+2.
> 
> I noticed that there's variable SOLR_TIMEZONE in 
> solr-7.6.0/bin/solr.in.sh-file. I changed it to  SOLR_TIMEZONE="EST", 
> re-created my solr-servers, but nothing changed. Why was that configuration 
> file ignored (I also changed the port to check whether it was ignored really) 
> ? And what is the purpose of  SOLR_TIMEZONE-variable ?
> 
> Br, Jaana Miettinen



Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread kevinc
Hi all,

I'm sure I've done this before but this seems to be falling down a bit and I
was wondering if anyone had any helpful ideas.

I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The
reprocessing for this takes a long time and so we normally reindex on a
secondary cluster and swap them out.

I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
cluster with 1 shard and replication factor of 3.

I want to copy over the index and have it replicate to the rest of the
cluster. I have taken a copy of the data directory from the reprocessed core
and copied it into the leader's data directory. This shows up correctly as
having a 51GB index and the documents are searchable.

I have tried the following curl commands to kick off replication:

curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
text/xml" --data-binary @test.xml
curl
http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E

I've tried this a few times and had a few different results:
The index gets set to 0 and has the single record I commit
A timed index gets created (index.201904082111232) and index.properties then
points to that
I had an issue with IndexWriter being closed
The index stays consistent and doesn't replicate
I've tried copying the index to both the leader and one other node to see if
that helps but I'm faced with similar results as above.

Does anyone have any advice to how I can get this index moved and replicated
onto this new cluster?

Thanks a lot!
Kevin.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-08 Thread Michael Gibney
In addition to Toke's suggestions (and those in the linked article), some
more ideas:
If single-term, bare queries are slow, it might be productive to check
config/performance of your queryResultCache (I realize this doesn't
directly address the concern of slow queries, but might nonetheless be
helpful in practice).
If multi-term queries that include these terms are slow, maybe check your
mm config to make sure it's not more inclusive than necessary for your use
case (scoring over union of docSets/clauses). If multi-term queries get
faster by disabling pf, you could try disabling main-query pf, and invoke
implicit phrase search (pseudo-pf) using ReRankQParser?
If you're able to share your configs (built queries, indexing/fieldType
config (positions, payloads?), etc.), that might enable more specific
advice.
I'm assuming the query-times posted are for queries that isolate the
performance of main query only (i.e., no other components, like facets,
etc.)?
Michael

On Mon, Apr 8, 2019 at 3:28 AM Ash Ramesh  wrote:

> Hi Toke,
>
> Thanks for the prompt reply. I'm glad to hear that this is a common
> problem. In regards to stop words, I've been thinking about trying that
> out. In our business case, most of these terms are keywords related to
> stock photography, therefore it's natural for 'photography' or 'background'
> to appear commonly in a document's keyword list. it seems unlikely we can
> use the common grams solution with our business case.
>
> Regards,
>
> Ash
>
> On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen  wrote:
>
> > On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> > > We have a corpus of 50+ million documents in our collection. I've
> > > noticed that some queries with specific keywords tend to be extremely
> > > slow.
> > > E.g. the q=`photography' or q='background'. After digging into the
> > > raw documents, I could see that these two terms appear in greater
> > > than 90% of all documents, which means solr has to score each of
> > > those documents.
> >
> > That is known behaviour, which can be remedied somewhat. Stop words is
> > a common approach, but your samples does not seem to fit well with
> > that. Instead you can look at Common Grams, where your high-frequency
> > words gets concatenated with surrounding words. This only works with
> > phrases though. There's a nice article at
> >
> >
> >
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
> >
> > - Toke Eskildsen, Royal Danish Library
> >
> >
> >
>
> --
> *P.S. We've launched a new blog to share the latest ideas and case studies
> from our team. Check it out here: product.canva.com
> . ***
> ** Empowering the
> world to design
> Also, we're hiring. Apply here!
> 
>  
>  
>   
>   
>
>
>
>
>
>
>


Re: Solr 8.0.0 - CPU usage 100% when indexed documents

2019-04-08 Thread vishal patel
I have created two solr shards with 3 zoo keeper. First do upconfig in zoo 
keeper then start the both solr with different port then create a 
"actionscomments" collection using API call.

When I indexed one document in actionscomments, my CPU utilization go high.

Note :
upconfig command ::  zkcli.bat -zkhost 
192.168.100.145:3181,192.168.100.145:3182,192.168.100.145:3183 -cmd upconfig 
-confdir E:/SolrCloud-8-0-0/solr1/server/solr/configsets/actionscomments/conf 
-confname actionscomments. 
[E:\SolrCloud-8-0-0\solr1\server\scripts\cloud-scripts]
Solr start command ::  solr start -p 7991 and solr start -p 7992 
[E:\SolrCloud-8-0-0\solr1\bin and E:\SolrCloud-8-0-0\solr2\bin]
Create a collection :: 
http://192.168.102.150:7991/solr/admin/collections?_=1554285992377&action=CREATE&autoAddReplicas=false&collection.configName=actionscomments&maxShardsPerNode=1&name=actionscomments&numShards=2&replicationFactor=1&router.name=compositeId&wt=json
Operating system :: windows server 2008 R2 standard

When I indexed document, CPU goes high and in thread dump noticed 
commitScheduler-25-thread-2,commitScheduler-48-thread-2 
,commitScheduler-21-thread-2. After sometimes  it automatically removed and CPU 
goes down.

In log file I can not find any error and I indexed document using 
AsiteSolrCloudManager.

I have attached my solrconfig.xml and schema.xml and also thread dump which got 
from solr admin GUI.

Sent from Outlook

From: Jörn Franke 
Sent: Monday, April 8, 2019 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 8.0.0 - CPU usage 100% when indexed documents

Can you please describe your scenario in detail ?

How does your load process look like (custom module? How many threads?)?

How many files do you try to index ? What is their format?
How does your solr config look like?

How many cores do you have? What else is installed on the Solr server?

Which Operation System?

What do the log files tell your from Solr and Zookeeper?

What is the Schema looking like?

> Am 08.04.2019 um 12:01 schrieb vishal patel :
>
> Hi
>
> I have configured 2 shards and 3 zoo keeper. When i indexed document in 
> collection, my CPU usage becomes a full.
> I have attached thread dump.
> Is there Any changes needed in solrconfig.xml?
>
> Sent from Outlook






  

 
   

   


   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
   
   
   

   
   

 


 
 id



 

  

   
   
   
   
   
   
   
   
   
	   
   

   
   
 
  





































	
	
		
			
			
		
		
			
			
	
	

	 

  

		


  
  



  


 
  
  
	






  

  
  8.0.0

  

  
  
  

  
  

  
  

  
  
  

  
  
  
  
  ${solr.data.dir:}


  
  
   

  
  

  
  

  
  





 16





 1024




   




   
  









  
  
  
  
  
  



	  
  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

 

  
		60
   2 
   false 
 


  
   ${solr.autoSoftCommit.maxTime:-1} 
 






  
  
  
  

  
  

100

 
 
-1









   



 




 







true

   
   
  true
 	 
   
   50

   
   100

   


  

  


  

  *:*

  



false


  


  
  








  

  
  
  

 
   none
	   xml
   10
   summary
 



  
  
 
   explicit
   json
   true
   text
 
  

  
  

  {!xport}
  xsort
  false



  query

  


  

  text

  




  
  
   



  

  

  
  

  text
  true


  tvComponent

  

  
  

  default

  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm

  
  clustering/carrot2
  20
  ENGLISH
  clustering/carrot2




  stc
  org.carrot2.clustering.stc.STCClusteringAlgorithm

  

  
  

  true
  default
  true
  
  name
  
  id
  
  features
  
  true
  
  
  
  false

  
  edismax
  
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  
  *:*
  10
  *,score


  clustering

  

  
  

  
  
 
  true
  false


  terms

  

  
 
text/plain; charset=UTF-8
  

  



  
  

I it possible to configure solr to show time stamps without the 'Z'- character in the end

2019-04-08 Thread Miettinen Jaana (STAT)
Dear recipient,

I have a problem in solr: I should add several (old) time stamps into my solr 
documents, but all of them are in  local time (UTC+2 or UTC+3 depending on 
day-light-saving situation). As default solr excepts all time stamps to be in 
UTC-time and adds the 'Z'-character into the end of the time stamp-strings to 
indicate, that the date should be considered as UTC-time.

Is it possible to change this 'Z'-notation ? Either I would want to get rid of 
that 'Z' or change it to denote UTC+2.

I noticed that there's variable SOLR_TIMEZONE in 
solr-7.6.0/bin/solr.in.sh-file. I changed it to  SOLR_TIMEZONE="EST", 
re-created my solr-servers, but nothing changed. Why was that configuration 
file ignored (I also changed the port to check whether it was ignored really) ? 
And what is the purpose of  SOLR_TIMEZONE-variable ?

Br, Jaana Miettinen


Re: Solr ignores configuration file

2019-04-08 Thread Jörn Franke
If there is no daylight savings... I would not do this. Accept that Solr is in 
UTC and do the conversion at UI level. Otherwise in case of daylight savings 
introduction / removal you run into a lot of problems (reindexing etc)

> Am 08.04.2019 um 13:08 schrieb Nitin Kumar :
> 
> One workaround is while indexing add +2 hours.
> 
>> On Mon 8 Apr, 2019, 4:16 PM ,  wrote:
>> 
>> 
>> Dear recipients,
>> 
>> Can you help me with the following issue:
>> 
>> I should present my time stamps in solr in UTC+2 instead of UTC. How can
>> I do it ?
>> 
>> I've created the following question in StackOverflow
>> 
>> 
>> https://stackoverflow.com/questions/55530142/solr-7-6-0-ignores-configuration-file-bin-solr-in-sh?noredirect=1#comment97766221_55530142
>> 
>> Br, Jaana Miettinen
>> 
>> 


Re: Solr ignores configuration file

2019-04-08 Thread Nitin Kumar
One workaround is while indexing add +2 hours.

On Mon 8 Apr, 2019, 4:16 PM ,  wrote:

>
> Dear recipients,
>
> Can you help me with the following issue:
>
> I should present my time stamps in solr in UTC+2 instead of UTC. How can
> I do it ?
>
> I've created the following question in StackOverflow
>
>
> https://stackoverflow.com/questions/55530142/solr-7-6-0-ignores-configuration-file-bin-solr-in-sh?noredirect=1#comment97766221_55530142
>
> Br, Jaana Miettinen
>
>


Solr Cloud - Data Import from Cassandra

2019-04-08 Thread Furkan Çifçi
Hello everyone,

We are using Solr(7.1) on cloud mode and trying to get data from Cassandra 
source. Can't import data from Cassandra.

In the error logs;

Full Import 
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
PropertyWriter implementation:SimplePropertiesWriter
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:330)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.findDirectory(SimplePropertiesWriter.java:131)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePropertiesWriter.java:93)
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:328)

Error logs says i cant do it in zookeeper mode.

Is there a  workaround for this situtation?

Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email." For 
information: b...@islem.com.tr


Re: Solr 8.0.0 - CPU usage 100% when indexed documents

2019-04-08 Thread Jörn Franke
Can you please describe your scenario in detail ? 

How does your load process look like (custom module? How many threads?)?

How many files do you try to index ? What is their format?
How does your solr config look like?

How many cores do you have? What else is installed on the Solr server?

Which Operation System?

What do the log files tell your from Solr and Zookeeper?

What is the Schema looking like? 

> Am 08.04.2019 um 12:01 schrieb vishal patel :
> 
> Hi
> 
> I have configured 2 shards and 3 zoo keeper. When i indexed document in 
> collection, my CPU usage becomes a full.
> I have attached thread dump.
> Is there Any changes needed in solrconfig.xml?
> 
> Sent from Outlook


Solr ignores configuration file

2019-04-08 Thread jaanam



Dear recipients,

Can you help me with the following issue:

I should present my time stamps in solr in UTC+2 instead of UTC. How can 
I do it ?


I've created the following question in StackOverflow

https://stackoverflow.com/questions/55530142/solr-7-6-0-ignores-configuration-file-bin-solr-in-sh?noredirect=1#comment97766221_55530142

Br, Jaana Miettinen



Solr 8.0.0 - CPU usage 100% when indexed documents

2019-04-08 Thread vishal patel
Hi

I have configured 2 shards and 3 zoo keeper. When i indexed document in 
collection, my CPU usage becomes a full.
I have attached thread dump.
Is there Any changes needed in solrconfig.xml?

Sent from Outlook


Re: Solr spellcheck Collation JSON

2019-04-08 Thread Mikhail Khludnev
>
> Previous Solr versions
> --
> "spellcheck": {
> ...,
> "collations": [
> "collation":"account" <--correct format
> ]

However, it's not a JSON.


On Mon, Apr 8, 2019 at 2:45 AM Moyer, Brett  wrote:

> Hello,
>
> Looks like a more recent Solr release introduced a bug for
> collation. Does anyone know of a way to correct it, or if a future release
> will address? Because of this change we had to make the app teams rewrite
> their code. Made us look bad because we can't control our code and
> introduced a bug their perspective) Thanks
>
> Solr 7.4
> --
> "spellcheck": {
> "suggestions": [
> "acount",
> {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "suggestion": [
> "account"
> ]
> }
> ],
> "collations": [
> "collation", <-this is the bad line
> "account"
> ]
>
> Previous Solr versions
> --
> "spellcheck": {
> "suggestions": [
> "acount",
> {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "suggestion": [
> "account"
> ]
> }
> ],
> "collations": [
> "collation":"account" <--correct format
> ]
>
> Brett Moyer
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-08 Thread Ash Ramesh
Hi Toke,

Thanks for the prompt reply. I'm glad to hear that this is a common
problem. In regards to stop words, I've been thinking about trying that
out. In our business case, most of these terms are keywords related to
stock photography, therefore it's natural for 'photography' or 'background'
to appear commonly in a document's keyword list. it seems unlikely we can
use the common grams solution with our business case.

Regards,

Ash

On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen  wrote:

> On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> > We have a corpus of 50+ million documents in our collection. I've
> > noticed that some queries with specific keywords tend to be extremely
> > slow.
> > E.g. the q=`photography' or q='background'. After digging into the
> > raw documents, I could see that these two terms appear in greater
> > than 90% of all documents, which means solr has to score each of
> > those documents.
>
> That is known behaviour, which can be remedied somewhat. Stop words is
> a common approach, but your samples does not seem to fit well with
> that. Instead you can look at Common Grams, where your high-frequency
> words gets concatenated with surrounding words. This only works with
> phrases though. There's a nice article at
>
>
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
>
> - Toke Eskildsen, Royal Danish Library
>
>
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here! 

  
  
    
  








Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-08 Thread Toke Eskildsen
On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> We have a corpus of 50+ million documents in our collection. I've
> noticed that some queries with specific keywords tend to be extremely
> slow.
> E.g. the q=`photography' or q='background'. After digging into the
> raw documents, I could see that these two terms appear in greater
> than 90% of all documents, which means solr has to score each of
> those documents.

That is known behaviour, which can be remedied somewhat. Stop words is
a common approach, but your samples does not seem to fit well with
that. Instead you can look at Common Grams, where your high-frequency
words gets concatenated with surrounding words. This only works with
phrases though. There's a nice article at

https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2

- Toke Eskildsen, Royal Danish Library