solr tika extraction video creation date problem (hours ahead)

2019-04-03 Thread Where is Where
Hello , I was following the instruction
https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html
to upload files with metadata stored and indexed in solr. I was checking
the extracted creation date ( attr_meta_creation_date ), for image, jpg
etc, the creation dates are correct but all creation dates for video are 11
hours ahead of the actual creation date. (The dates are correct when viewed
in other applications) It causes problem with searching due to this
inconsistency. Any idea is much appreciated. Thanks!


Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi David,

Yes, I do have this field "_root_" in the schema.

   

However, I don't think I have use the field, and there is no difference in
the indexing speed after I remove the field.

Regards,
Edwin

On Wed, 3 Apr 2019 at 22:57, David Smiley  wrote:

> Hi Edwin,
>
> I'd like to rule something out.  Does your schema define a field "_root_"?
> If you don't have nested documents then remove it.  It's presence adds
> indexing weight in 8.0 that was not there previously.  I'm not sure how
> much though; I've hoped small but who knows.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Apr 2, 2019 at 10:17 PM Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
> > scratch in Solr 8.0.0
> >
> > However, I found that the indexing speed is slower in Solr 8.0.0, as
> > compared to the earlier version like Solr 7.7.1. I have not changed the
> > schema.xml and solrconfig.xml yet, just did a change of the
> > luceneMatchVersion in solrconfig.xml to 8.0.0
> > uceneMatchVersion>8.0.0
> >
> > On average, the speed is about 40% to 50% slower. For example, the
> indexing
> > speed was about 17 mins in Solr 7.7.1, but now it takes about 25 mins to
> > index the same set of data.
> >
> > What could be the reason that causes the indexing to be slower in Solr
> > 8.0.0?
> >
> > Regards,
> > Edwin
> >
>


high cpu threads (solr 7.5)

2019-04-03 Thread Hari Nakka
We are noticing high CPU utilization on below threads.  Looks like a known
issue with. (https://github.com/netty/netty/issues/327)

But not sure if this has been addressed in any of the 1.8 releases.

Can anyone help with this?


Version: solr cloud 7.5

OS: CentOS 7

JDK: Oracle JDK 1.8.0_191





"qtp574568002-3821728" #3821728 prio=5 os_prio=0 tid=0x7f4f20018000
nid=0x4996 runnable [0x7f51fc6d8000]

   java.lang.Thread.State: RUNNABLE

at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

- locked <0x00064cded430> (a sun.nio.ch.Util$3)

- locked <0x00064cded418> (a
java.util.Collections$UnmodifiableSet)

- locked <0x00064cdf6e38> (a sun.nio.ch.EPollSelectorImpl)

at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)

at
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:396)

at
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:333)

at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)

at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)

at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)

at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)

at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)

at java.lang.Thread.run(Thread.java:748)


Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-04-03 Thread Noble Paul
Thanks Jim

On Fri, Mar 15, 2019 at 1:39 AM Toke Eskildsen  wrote:
>
> On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote:
> > http://lucene.apache.org/solr/8_0_0/changes/Changes.html
>
> Thank you for the hard work of rolling the release!
> Looking forward to upgrading.
>
> - Toke Eskildsen, Royal Danish Library
>
>


-- 
-
Noble Paul


Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Yes, I know the reasons why put this work on a client rather than use Solr
directly and it should be maybe the next my task.
But I need to finish first my task - index a pdf files stored in SqlBase
database. The pdf files are pretty simple, sometimes only dozens text lines.

Regards,

Aruna

On Wed, Apr 3, 2019 at 5:03 PM Erick Erickson 
wrote:

> For a lot of reasons, I greatly prefer to put this work on a client rather
> than use Solr directly. Here’s a place to get started, it connects to a DB
> and also scans local file directory for docs to push through (local) Tika
> and index. So you should be able to modify it relatively easily to get the
> data from SqlBase, read the associated PDF, combine the two and send to
> Solr.
>
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> The code itself is a bit old, but illustrates the process.
>
> Best,
> Erick
>
> > On Apr 2, 2019, at 11:46 PM, Arunas Spurga  wrote:
> >
> > Hello,
> >
> > I got a task to index in Solr 7.71 a PDF files which are stored in
> SqlBase
> > database. I did half the job - I can to index all table fields, I can do
> a
> > search in these fields except field in which is stored a pdf file
> content.
> > As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
> > understand how to force to extract and index field with pdf content. I
> need
> > a help.
> >
> > Regards,
> >
> > Aruna
> >
> > in solrconfig.xml i have
> >
> >
> > *  dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib"
> > regex=".*\.jar" />   > regex="solr-dataimporthandler-.*\.jar" /> *
> > *   > regex=".*\.jar" />*
> > *   > regex="solr-cell-\d.*\.jar" />*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * > startup="lazy"
> > class="solr.extraction.ExtractingRequestHandler" > > name="defaults">  true   > name="fmap.meta">ignored_   > name="fmap.content">_text_  *
> >
> >
> >
> >
> >
> > * > class="org.apache.solr.handler.dataimport.DataImportHandler">> name="defaults">db-data-config.xml   
> > *
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> *-db-data-config.xml > type="JdbcDataSource"
> > driver="jdbc.unify.sqlbase.SqlbaseDriver"
> > url="jdbc:sqlbase://localhost:2155/PDFDOCS"
> > user="sysadm"password="sysadm" />  > name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
> >  > name="PDF" />
> > *
>
>


Re: Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread Erick Erickson
Looks like: https://issues.apache.org/jira/browse/SOLR-11728

> On Apr 3, 2019, at 1:09 AM, JiaJun Zhu  wrote:
> 
> Hello,
> 
> 
> I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform 
> group facet on an IntPointField. Debugging into the source code, the cause is 
> that internally the docvalue type for PointField is "NUMERIC" (single value) 
> or "SORTED_NUMERIC" (multi value), while the TermGroupFacetCollector class 
> requires the facet field must have a "SORTED" or "SOTRTED_SET" docvalue type: 
> https://github.com/apache/lucene-solr/blob/2480b74887eff01f729d62a57b415d772f947c91/lucene/grouping/src/java/org/apache/lucene/search/grouping/TermGroupFacetCollector.java#L313
> 
> When I change schema for all int field to TrieIntField, the group facet then 
> work. Since internally the docvalue type for TrieField is SORTED (single 
> value) or SORTED_SET (multi value).
> 
> Regarding that the "TrieField" is depreciated in Solr7, can someone help on 
> this grouping facet issue for PointField. I also commented this issue in 
> SOLR-7495.
> 
> 
> Thanks.
> 
> 
> 
> Best regards,
> 
> JiaJun
> Manager Technology
> Alexander Street, a ProQuest Company
> No. 201 NingXia Road, Room 6J Shanghai China P.R.
> 200063



Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Erick Erickson
For a lot of reasons, I greatly prefer to put this work on a client rather than 
use Solr directly. Here’s a place to get started, it connects to a DB and also 
scans local file directory for docs to push through (local) Tika and index. So 
you should be able to modify it relatively easily to get the data from SqlBase, 
read the associated PDF, combine the two and send to Solr.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

The code itself is a bit old, but illustrates the process.

Best,
Erick

> On Apr 2, 2019, at 11:46 PM, Arunas Spurga  wrote:
> 
> Hello,
> 
> I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase
> database. I did half the job - I can to index all table fields, I can do a
> search in these fields except field in which is stored a pdf file content.
> As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
> understand how to force to extract and index field with pdf content. I need
> a help.
> 
> Regards,
> 
> Aruna
> 
> in solrconfig.xml i have
> 
> 
> *  regex=".*\.jar" />   regex="solr-dataimporthandler-.*\.jar" /> *
> *   regex=".*\.jar" />*
> *   regex="solr-cell-\d.*\.jar" />*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> * startup="lazy"
> class="solr.extraction.ExtractingRequestHandler" > name="defaults">  true   name="fmap.meta">ignored_   name="fmap.content">_text_  *
> 
> 
> 
> 
> 
> * class="org.apache.solr.handler.dataimport.DataImportHandler">name="defaults">db-data-config.xml   
> *
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *-db-data-config.xml type="JdbcDataSource"
> driver="jdbc.unify.sqlbase.SqlbaseDriver"
> url="jdbc:sqlbase://localhost:2155/PDFDOCS"
> user="sysadm"password="sysadm" />  name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
>  name="PDF" />
> *



Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
Hi Edwin,

I'd like to rule something out.  Does your schema define a field "_root_"?
If you don't have nested documents then remove it.  It's presence adds
indexing weight in 8.0 that was not there previously.  I'm not sure how
much though; I've hoped small but who knows.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Apr 2, 2019 at 10:17 PM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
> scratch in Solr 8.0.0
>
> However, I found that the indexing speed is slower in Solr 8.0.0, as
> compared to the earlier version like Solr 7.7.1. I have not changed the
> schema.xml and solrconfig.xml yet, just did a change of the
> luceneMatchVersion in solrconfig.xml to 8.0.0
> uceneMatchVersion>8.0.0
>
> On average, the speed is about 40% to 50% slower. For example, the indexing
> speed was about 17 mins in Solr 7.7.1, but now it takes about 25 mins to
> index the same set of data.
>
> What could be the reason that causes the indexing to be slower in Solr
> 8.0.0?
>
> Regards,
> Edwin
>


Re: SolrCloud with separate JAVA instances

2019-04-03 Thread Shawn Heisey

On 4/3/2019 8:16 AM, Bernd Fehling wrote:
If I now use the Admin GUI at port 8983 and select "Cloud"->"Graph" I 
see both collections.

Also with Admin GUI at port port 7574.
And I can select both collection in "Collection Selection" dropdown box.

Why and is this how it should be?

I thought different JAVA instances at different ports are separated by 
each other?


If you have multiple Solr instances on the same machine, SolrCloud has 
no idea that they are on the same machine.  It will treat them as if 
they are separate machines.


You can see both collections in one admin UI because all of the machines 
are all using the same ZooKeeper string when they start.  That means 
they're all part of the same cloud.  They would need to be using 
different zookeeper information to be separate -- that could either be 
different ZK servers or a different chroot on the zkstring.


Thanks,
Shawn


Re: SolrCloud with separate JAVA instances

2019-04-03 Thread Erick Erickson
bq. I thought different JAVA instances at different ports are separated by each 
other?

Not at all. If that were true, how would you use more than one physical 
machine? The combination URL:PORT is, from Solr’s perspective, just some Solr 
node. There’s no assumption about what machine it’s running on, whether there 
are two or more JVMs on the same machine etc.

Best,
Erick

> On Apr 3, 2019, at 7:16 AM, Bernd Fehling  
> wrote:
> 
> I have SolrCloud with a collection "test1" with 5 shards 2 replicas accoss 5 
> server.
> This cloud is started at port 8983 on each server.
> 
> Now I have a second collection "test2" with 5 shards 1 replica accross the 
> same
> 5 server. But this second collection is started in seperate JAVA instances at
> port 7574 on all 5 server.
> 
> Both JAVA instances use the same zookeeper pool but each collection has its 
> own
> config in zookeeper.
> 
> If I now use the Admin GUI at port 8983 and select "Cloud"->"Graph" I see 
> both collections.
> Also with Admin GUI at port port 7574.
> And I can select both collection in "Collection Selection" dropdown box.
> 
> Why and is this how it should be?
> 
> I thought different JAVA instances at different ports are separated by each 
> other?
> 
> Regards,
> Bernd



SolrCloud with separate JAVA instances

2019-04-03 Thread Bernd Fehling

I have SolrCloud with a collection "test1" with 5 shards 2 replicas accoss 5 
server.
This cloud is started at port 8983 on each server.

Now I have a second collection "test2" with 5 shards 1 replica accross the same
5 server. But this second collection is started in seperate JAVA instances at
port 7574 on all 5 server.

Both JAVA instances use the same zookeeper pool but each collection has its own
config in zookeeper.

If I now use the Admin GUI at port 8983 and select "Cloud"->"Graph" I see both 
collections.
Also with Admin GUI at port port 7574.
And I can select both collection in "Collection Selection" dropdown box.

Why and is this how it should be?

I thought different JAVA instances at different ports are separated by each 
other?

Regards,
Bernd


Re: Documentation for Apache Solr 8.0.0?

2019-04-03 Thread Yoann Moulin
Hello,

 I’m looking for the documentation for the latest release of SolR (8.0) but 
 it looks like it’s not online yet.

 https://lucene.apache.org/solr/news.html

 http://lucene.apache.org/solr/guide/

 Do you know when it will be available?
>>>
>>> The Solr Reference Guide (of which the online documentation is a part)
>>> gets built and released separately from the Solr distribution itself.
>>> The Solr community tries to keep the code and documentation releases
>>> as close together as we can, but the releases require work and are
>>> done on a volunteer basis. No one has volunteered for the 8.0.0
>>> reference-guide release yet, but I suspect a volunteer will come
>>> forward soon.
>>>
>>> In the meantime though, there is documentation for Solr 8.0.0
>>> available. Solr's documentation is included alongside the code. You
>>> can checkout Solr and build the documentation yourself by moving to
>>> "solr/solr-ref-guide" and running the command "ant clean default" from
>>> that directory. This will build the same HTML pages you're used to
>>> seeing at lucene.apache.org/solr/guide, and you can open the local
>>> copies in your browser and browse them as you normally would.
>>>
>>> Alternatively, the Solr mirror on Github does its best to preview the
>>> documentation. It doesn't display perfectly, but it might be helpful
>>> for tiding you over until the official documentation is available, if
>>> you're unwilling or unable to build the documentation site locally:
>>> https://github.com/apache/lucene-solr/blob/branch_8_0/solr/solr-ref-guide/src/index.adoc
>>
>> There is also a *DRAFT* HTML version of the to-be 8.1 guide built by 
>> Jenkins, see
>> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/
>>  
>> 
>> It may serve as a place to read up while waiting for the 8,0 guide, as they 
>> are almost identical still.
>
> The *DRAFT* 8.0 Guide is also available from Jenkins:
>
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/

OK Thanks to all, I just built the doc following the instruction of Jason :) 
but good to know there is a doc available online too.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT


Re: Documentation for Apache Solr 8.0.0?

2019-04-03 Thread Cassandra Targett
The *DRAFT* 8.0 Guide is also available from Jenkins:

https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/

Cassandra
On Apr 2, 2019, 3:23 AM -0500, Jan Høydahl , wrote:
> There is also a *DRAFT* HTML version of the to-be 8.1 guide built by Jenkins, 
> see
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/
>  
> 
> It may serve as a place to read up while waiting for the 8,0 guide, as they 
> are almost identical still.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 1. apr. 2019 kl. 16:11 skrev Jason Gerlowski :
> >
> > The Solr Reference Guide (of which the online documentation is a part)
> > gets built and released separately from the Solr distribution itself.
> > The Solr community tries to keep the code and documentation releases
> > as close together as we can, but the releases require work and are
> > done on a volunteer basis. No one has volunteered for the 8.0.0
> > reference-guide release yet, but I suspect a volunteer will come
> > forward soon.
> >
> > In the meantime though, there is documentation for Solr 8.0.0
> > available. Solr's documentation is included alongside the code. You
> > can checkout Solr and build the documentation yourself by moving to
> > "solr/solr-ref-guide" and running the command "ant clean default" from
> > that directory. This will build the same HTML pages you're used to
> > seeing at lucene.apache.org/solr/guide, and you can open the local
> > copies in your browser and browse them as you normally would.
> >
> > Alternatively, the Solr mirror on Github does its best to preview the
> > documentation. It doesn't display perfectly, but it might be helpful
> > for tiding you over until the official documentation is available, if
> > you're unwilling or unable to build the documentation site locally:
> > https://github.com/apache/lucene-solr/blob/branch_8_0/solr/solr-ref-guide/src/index.adoc
> >
> > Hope that helps,
> >
> > Jason
> >
> > On Mon, Apr 1, 2019 at 7:34 AM Yoann Moulin  wrote:
> > >
> > > Hello,
> > >
> > > I’m looking for the documentation for the latest release of SolR (8.0) 
> > > but it looks like it’s not online yet.
> > >
> > > https://lucene.apache.org/solr/news.html
> > >
> > > http://lucene.apache.org/solr/guide/
> > >
> > > Do you know when it will be available?
> > >
> > > Best regards.
> > >
> > > --
> > > Yoann Moulin
> > > EPFL IC-IT
>


Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
What/where is this benchmark?  I recall once Ishan was working with a
volunteer to set up something like Lucene has but sadly it was not
successful

On Wed, Apr 3, 2019 at 6:04 AM Đạt Cao Mạnh  wrote:

> Hi guys,
>
> I'm seeing the same problems with Shalin nightly indexing benchmark. This
> happen around this period
> git log --before=2018-12-07 --after=2018-11-21
>
> On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen  wrote:
>
>> On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
>> > Yes, I am using DocValues for most of my fields.
>>
>> So that's a culprit. Thank you.
>>
>> > Currently we can't share the test data yet as some of the records are
>> > sensitive. Do you have any data from CSV file that you can test?
>>
>> Not really. I asked because it was a relatively easy way to do testing
>> (replicate your indexing flow with both Solr 7 & 8 as end-points,
>> attach JVisualVM to the Solrs and compare the profiles).
>>
>>
>> I'll put on my to-do to create a test or two with the scenario
>> "indexing from CSV with many DocValues fields". I'll try and generate
>> some test data and see if I can reproduce with them. If this is to be a
>> JIRA, that's needed anyway. Can't promise when I'll get to it, sorry.
>>
>> If this does turn out to be the cause of your performance regression,
>> the fix (if possible) will be for a later Solr version. Currently it is
>> not possible to tweak the docValues indexing parameters outside of code
>> changes.
>>
>>
>> Do note that we're still operating on guesses here. The cause for your
>> regression might easily be elsewhere.
>>
>> - Toke Eskildsen, Royal Danish Library
>>
>>
>>
>
> --
> *Best regards,*
> *Cao Mạnh Đạt*
>
>
> *D.O.B : 31-07-1991Cell: (+84) 946.328.329E-mail: caomanhdat...@gmail.com
> *
>
-- 
Sent from Gmail Mobile


Spatial Search using two separate fields for lat and long

2019-04-03 Thread Tim Hedlund
Hi all,

I'm importing documents (rows in excel file) that includes latitude and 
longitude fields. I want to use those two separate fields for searching with a 
bounding box. Is this possible (not using deprecated LatLonType) or do I need 
to combine them into one single field when indexing? The reason I want to keep 
the fields as two separate ones is that I want to be able to export from solr 
back to exact same excel file structure, i.e. solr fields maps exactly to excel 
columns.

I'm using solr 7. Any thoughts or suggestions would be appreciated.

Regards
Tim



Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 18:04 +0800, Zheng Lin Edwin Yeo wrote:
> I have tried to set all the docValues in my schema.xml to false and
> do the indexing again.
> There isn't any difference with the indexing speed as compared to
> when we have enabled the docValues.

Thank you for sparing me the work.

- Toke Eskildsen, Royal Danish Library




Re: Solr 7 not removing a node completely due to too small thread pool

2019-04-03 Thread Roger Lehmann
Oh great, thanks for the hint!
I've upvoted this issue, since I think it might be worth to be able to
configure that (rather low) ThreadPool count.

On Wed, 3 Apr 2019 at 10:23, Shalin Shekhar Mangar 
wrote:

> Thanks Roger. This was reported earlier but missed our attention.
>
> The issue is https://issues.apache.org/jira/browse/SOLR-11208
>
> On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann 
> wrote:
>
> > To be more specific: I currently have 19 collections, where each node has
> > exactly one replica per collection. A new node will automatically create
> > new replicas on itself, one for each existing collection (see
> > cluster-policy above).
> > So when removing a node, all 19 collection replicas of it need to be
> > removed. This can't be done in one go due to thread count (parallel
> > synchronous execution) being only 10 and is not scaling up when
> necessary.
> >
> > On Fri, 29 Mar 2019 at 14:20, Roger Lehmann  >
> > wrote:
> >
> > > Situation
> > >
> > > I'm currently trying to set up SolrCloud in an AWS Autoscaling Group,
> so
> > > that it can scale dynamically.
> > >
> > > I've also added the following triggers to Solr, so that each node will
> > > have 1 (and only one) replication of each collection:
> > >
> > > {
> > > "set-cluster-policy": [
> > >   {"replica": "<2", "shard": "#EACH", "node": "#EACH"}
> > >   ],
> > >   "set-trigger": [{
> > > "name": "node_added_trigger",
> > > "event": "nodeAdded",
> > > "waitFor": "5s",
> > > "preferredOperation": "ADDREPLICA"
> > >   },{
> > > "name": "node_lost_trigger",
> > > "event": "nodeLost",
> > > "waitFor": "120s",
> > > "preferredOperation": "DELETENODE"
> > >   }]
> > > }
> > >
> > > This works pretty well. But my problem is that when the a node gets
> > > removed, it doesn't remove all 19 replicas from this node and I have
> > > problems when accessing the "nodes" page:
> > >
> > > [image: enter image description here]
> > > 
> > >
> > > In the logs, this exception occurs:
> > >
> > > Operation deletenode
> > failed:java.util.concurrent.RejectedExecutionException: Task
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2
> > rejected from
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df
> [Running,
> > pool size = 10, active threads = 10, queued tasks = 0, completed tasks =
> 1]
> > > at
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> > > at
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> > > at
> >
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> > > at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
> > > at
> >
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> > > at
> >
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276)
> > > at
> >
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95)
> > > at
> >
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109)
> > > at
> >
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62)
> > > at
> >
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292)
> > > at
> >
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496)
> > > at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> > > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > > Problem description
> > >
> > > So, the problem is that it only has a pool size of 10, of which 10 are
> > > busy and nothing gets queued (synchronous execution). In fact, it
> really
> > > only removed 10 replicas and the other 9 replicas stayed there. When
> > > manually sending the API command to delete this node it works fine,
> since
> > > Solr only needs to remove the remaining 9 replicas and everything is
> good
> > > again.
> > > Question
> > >
> > > How can I either increase this (small) thread pool size and/or activate
> > > queueing the remaining deletion tasks? Another solution might be to
> retry
> > > the failed task until it succeeds.
> > >
> > > Using Solr 7.7.1 on Ubuntu Server installed with the installation
> script
> > > from Solr (so I guess it's using Jetty?).
> > >
> > > Thanks for your help!
> > >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


-- 

Roger Lehmann
Linux-System-Engineer

T: 0351-418 894 –76


Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi Toke,

I have tried to set all the docValues in my schema.xml to false and do the
indexing again.
There isn't any difference with the indexing speed as compared to when we
have enabled the docValues.

Seems like the cause of the regression might be somewhere else?

Regards,
Edwin

On Wed, 3 Apr 2019 at 15:45, Toke Eskildsen  wrote:

> On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
> > Yes, I am using DocValues for most of my fields.
>
> So that's a culprit. Thank you.
>
> > Currently we can't share the test data yet as some of the records are
> > sensitive. Do you have any data from CSV file that you can test?
>
> Not really. I asked because it was a relatively easy way to do testing
> (replicate your indexing flow with both Solr 7 & 8 as end-points,
> attach JVisualVM to the Solrs and compare the profiles).
>
>
> I'll put on my to-do to create a test or two with the scenario
> "indexing from CSV with many DocValues fields". I'll try and generate
> some test data and see if I can reproduce with them. If this is to be a
> JIRA, that's needed anyway. Can't promise when I'll get to it, sorry.
>
> If this does turn out to be the cause of your performance regression,
> the fix (if possible) will be for a later Solr version. Currently it is
> not possible to tweak the docValues indexing parameters outside of code
> changes.
>
>
> Do note that we're still operating on guesses here. The cause for your
> regression might easily be elsewhere.
>
> - Toke Eskildsen, Royal Danish Library
>
>
>


Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Đạt Cao Mạnh
Hi guys,

I'm seeing the same problems with Shalin nightly indexing benchmark. This
happen around this period
git log --before=2018-12-07 --after=2018-11-21

On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen  wrote:

> On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
> > Yes, I am using DocValues for most of my fields.
>
> So that's a culprit. Thank you.
>
> > Currently we can't share the test data yet as some of the records are
> > sensitive. Do you have any data from CSV file that you can test?
>
> Not really. I asked because it was a relatively easy way to do testing
> (replicate your indexing flow with both Solr 7 & 8 as end-points,
> attach JVisualVM to the Solrs and compare the profiles).
>
>
> I'll put on my to-do to create a test or two with the scenario
> "indexing from CSV with many DocValues fields". I'll try and generate
> some test data and see if I can reproduce with them. If this is to be a
> JIRA, that's needed anyway. Can't promise when I'll get to it, sorry.
>
> If this does turn out to be the cause of your performance regression,
> the fix (if possible) will be for a later Solr version. Currently it is
> not possible to tweak the docValues indexing parameters outside of code
> changes.
>
>
> Do note that we're still operating on guesses here. The cause for your
> regression might easily be elsewhere.
>
> - Toke Eskildsen, Royal Danish Library
>
>
>

-- 
*Best regards,*
*Cao Mạnh Đạt*


*D.O.B : 31-07-1991Cell: (+84) 946.328.329E-mail: caomanhdat...@gmail.com
*


NestPathField

2019-04-03 Thread Vincenzo D'Amore
Hi all,

I've found NestPathField fieldType in the solr 8.0.0 configuration.
But looking in the documentation I haven't found anything.
Just curious, someone have time to share something about?
For example explain how to use this?

Best regards,
Vincenzo

-- 
Vincenzo D'Amore


Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread JiaJun Zhu
Hello,


I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform 
group facet on an IntPointField. Debugging into the source code, the cause is 
that internally the docvalue type for PointField is "NUMERIC" (single value) or 
"SORTED_NUMERIC" (multi value), while the TermGroupFacetCollector class 
requires the facet field must have a "SORTED" or "SOTRTED_SET" docvalue type: 
https://github.com/apache/lucene-solr/blob/2480b74887eff01f729d62a57b415d772f947c91/lucene/grouping/src/java/org/apache/lucene/search/grouping/TermGroupFacetCollector.java#L313

When I change schema for all int field to TrieIntField, the group facet then 
work. Since internally the docvalue type for TrieField is SORTED (single value) 
or SORTED_SET (multi value).

Regarding that the "TrieField" is depreciated in Solr7, can someone help on 
this grouping facet issue for PointField. I also commented this issue in 
SOLR-7495.


Thanks.



Best regards,

JiaJun
Manager Technology
Alexander Street, a ProQuest Company
No. 201 NingXia Road, Room 6J Shanghai China P.R.
200063


Re: Solr 7 not removing a node completely due to too small thread pool

2019-04-03 Thread Shalin Shekhar Mangar
Thanks Roger. This was reported earlier but missed our attention.

The issue is https://issues.apache.org/jira/browse/SOLR-11208

On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann 
wrote:

> To be more specific: I currently have 19 collections, where each node has
> exactly one replica per collection. A new node will automatically create
> new replicas on itself, one for each existing collection (see
> cluster-policy above).
> So when removing a node, all 19 collection replicas of it need to be
> removed. This can't be done in one go due to thread count (parallel
> synchronous execution) being only 10 and is not scaling up when necessary.
>
> On Fri, 29 Mar 2019 at 14:20, Roger Lehmann 
> wrote:
>
> > Situation
> >
> > I'm currently trying to set up SolrCloud in an AWS Autoscaling Group, so
> > that it can scale dynamically.
> >
> > I've also added the following triggers to Solr, so that each node will
> > have 1 (and only one) replication of each collection:
> >
> > {
> > "set-cluster-policy": [
> >   {"replica": "<2", "shard": "#EACH", "node": "#EACH"}
> >   ],
> >   "set-trigger": [{
> > "name": "node_added_trigger",
> > "event": "nodeAdded",
> > "waitFor": "5s",
> > "preferredOperation": "ADDREPLICA"
> >   },{
> > "name": "node_lost_trigger",
> > "event": "nodeLost",
> > "waitFor": "120s",
> > "preferredOperation": "DELETENODE"
> >   }]
> > }
> >
> > This works pretty well. But my problem is that when the a node gets
> > removed, it doesn't remove all 19 replicas from this node and I have
> > problems when accessing the "nodes" page:
> >
> > [image: enter image description here]
> > 
> >
> > In the logs, this exception occurs:
> >
> > Operation deletenode
> failed:java.util.concurrent.RejectedExecutionException: Task
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2
> rejected from
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df[Running,
> pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 1]
> > at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> > at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> > at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> > at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
> > at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> > at
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276)
> > at
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95)
> > at
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109)
> > at
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62)
> > at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292)
> > at
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496)
> > at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Problem description
> >
> > So, the problem is that it only has a pool size of 10, of which 10 are
> > busy and nothing gets queued (synchronous execution). In fact, it really
> > only removed 10 replicas and the other 9 replicas stayed there. When
> > manually sending the API command to delete this node it works fine, since
> > Solr only needs to remove the remaining 9 replicas and everything is good
> > again.
> > Question
> >
> > How can I either increase this (small) thread pool size and/or activate
> > queueing the remaining deletion tasks? Another solution might be to retry
> > the failed task until it succeeds.
> >
> > Using Solr 7.7.1 on Ubuntu Server installed with the installation script
> > from Solr (so I guess it's using Jetty?).
> >
> > Thanks for your help!
> >
>


-- 
Regards,
Shalin Shekhar Mangar.


Basic auth and index replication

2019-04-03 Thread Dwane Hall
Hey Solr community.



I’ve been following a couple of open JIRA tickets relating to use of the basic 
auth plugin in a Solr cluster (https://issues.apache.org/jira/browse/SOLR-12584 
, https://issues.apache.org/jira/browse/SOLR-12860) and recently I’ve noticed 
similar behaviour when adding tlog replicas to an existing Solr collection.  
The problem appears to occur when Solr attempts to replicate the leaders index 
to a follower on another Solr node and it fails authentication in the process.



My environment

Solr cloud 7.6

Basic auth plugin enabled

SSL



Has anyone else noticed similar behaviour when using tlog replicas?



Thanks,



Dwane



2019-04-03T13:27:22,774 5000851636 WARN  : [   ] 
org.apache.solr.handler.IndexFetcher : Master at: 
https://myserver:myport/solr/mycollection_shard1_replica_n17/ is not available. 
Index fetch failed by exception: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at https://myserver:myport/solr/mycollection_shard1_replica_n17: 
Expected mime type application/octet-stream but got text/html. 





Error 401 require authentication



HTTP ERROR 401

Problem accessing /solr/mycollection_shard1_replica_n17/replication. Reason:

require authentication







Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
> Yes, I am using DocValues for most of my fields.

So that's a culprit. Thank you.

> Currently we can't share the test data yet as some of the records are
> sensitive. Do you have any data from CSV file that you can test? 

Not really. I asked because it was a relatively easy way to do testing
(replicate your indexing flow with both Solr 7 & 8 as end-points,
attach JVisualVM to the Solrs and compare the profiles).


I'll put on my to-do to create a test or two with the scenario
"indexing from CSV with many DocValues fields". I'll try and generate
some test data and see if I can reproduce with them. If this is to be a
JIRA, that's needed anyway. Can't promise when I'll get to it, sorry.

If this does turn out to be the cause of your performance regression,
the fix (if possible) will be for a later Solr version. Currently it is
not possible to tweak the docValues indexing parameters outside of code
changes.


Do note that we're still operating on guesses here. The cause for your
regression might easily be elsewhere.

- Toke Eskildsen, Royal Danish Library




Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Yes, I am using DocValues for most of my fields.













I am using dynamicField, in which I have appended the field name with
things like _s, _i, etc in the CSV file.

   
   
   
   


Currently we can't share the test data yet as some of the records are
sensitive. Do you have any data from CSV file that you can test?
If not we have to remove all the sensitive data before I can share.

Regards,
Edwin



On Wed, 3 Apr 2019 at 14:38, Toke Eskildsen  wrote:

> On Wed, 2019-04-03 at 10:17 +0800, Zheng Lin Edwin Yeo wrote:
> > What could be the reason that causes the indexing to be slower in
> > Solr 8.0.0?
>
> As Aroop states there can be multiple explanations. One of them is the
> change to how DocValues are handled in 8.0.0. The indexing impact
> should be tiny, but mistakes happen. With that in mind, do you have
> DocValues enabled for a lot of your fields?
>
> Performance issues like this one are notoriously hard to debug remote.
> Is it possible for you to share your setup and your test data?
>
> - Toke Eskildsen, Royal Danish Library
>
>
>


Solr not starting after enabling SSL

2019-04-03 Thread Anchal Sharma2
 
Hi All,

We recently migrated our existing solr(version 5.3.0) from AIX OS server to 
Linux based server.And it works fine(http solr) .
RHEL version 7.6
Java version 1.8(IBM Java)

But now ,when trying to enable SSL over same ,the solr doesnt start after 
enabling SSL.
It says "Address already in use" despite there being no solr up .

2019-04-03 06:29:29.892 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED 
ServerConnector@cdf341f{SSL-http/1.1}{0.0.0.0:8983}: java.net.BindException: 
Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:460)
at sun.nio.ch.Net.bind(Net.java:452)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:253)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:86)
at 
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321)
at 
org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at 
org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:366)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1255)
at 
java.security.AccessController.doPrivileged(AccessController.java:647)
at 
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:321)
at org.eclipse.jetty.start.Main.start(Main.java:817)
at org.eclipse.jetty.start.Main.main(Main.java:112)


Steps used to enable solr SSL 
:https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html (Same was used over 
AIX server's solr to enable SSL and we were successful there)

Any suggestion would be highly appreciated!!

Thanks & Regards,
-
Anchal Sharma




Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Hello,

I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase
database. I did half the job - I can to index all table fields, I can do a
search in these fields except field in which is stored a pdf file content.
As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
understand how to force to extract and index field with pdf content. I need
a help.

Regards,

Aruna

in solrconfig.xml i have


**
*  *
*  *









*  true  ignored_  _text_  *





*   db-data-config.xml   
*



















*-db-data-config.xml 
 
*


Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 10:17 +0800, Zheng Lin Edwin Yeo wrote:
> What could be the reason that causes the indexing to be slower in
> Solr 8.0.0?

As Aroop states there can be multiple explanations. One of them is the
change to how DocValues are handled in 8.0.0. The indexing impact
should be tiny, but mistakes happen. With that in mind, do you have
DocValues enabled for a lot of your fields?

Performance issues like this one are notoriously hard to debug remote.
Is it possible for you to share your setup and your test data?

- Toke Eskildsen, Royal Danish Library