from:"Rick Leir"

Re: How to index and search (integer or float) vector.

2018-04-13 Thread Rick Leir

Jason
One way is simply to use a multi value field. But this is not officially a 
vector, and the order might not be guaranteed. I suspect you can just post a 
document with the values, and see them in order. 

Search for a single value would not be very useful.

Another way is to choose a textual representation for the vector, and save it 
as a string (not tokenized). This is more complicated.

What's the use case? Do you want the vector to be searchable?
Cheers -- Rick

On April 12, 2018 8:44:10 PM EDT, Jason  wrote:
>Hi,I have specific documents that consist of integer vector with fixed
>length.But I have no idea how to index integer vector and search
>similar
>vector.Which fieldType should I use to solve this problem?And can I get
>any
>example for how to search?
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Rick Leir

Google 
   fix to json, 
there are a few interesting leads.

On April 2, 2018 12:34:44 AM EDT, Raymond Xie  wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey 
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Rick Leir

Ray
Have you looked around for an existing FIX to Solr conduit? If FIX is a common 
standard then I would expect that someone has done some work on this and 
github'd it.

Even just FIX to JSON.
Cheers -- Rick

On April 2, 2018 12:34:44 AM EDT, Raymond Xie  wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey 
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-02 Thread Rick Leir

Raymond
There is a default field normally called df. You would normally use Copyfield 
to copy all searchable fields into the default field. 
Cheers -- Rick

On April 1, 2018 11:34:07 PM EDT, Raymond Xie <xie3208...@gmail.com> wrote:
>Hi Rick,
>
>I sorted it out half:
>
>I should have specified the field in the search query, so, instead of
>http://localhost:8983/solr/films/browse?q=batman, I should use:
>http://localhost:8983/solr/films/browse?q=name:batman
>
>Sorry for this newbie mistake.
>
>But what about if I/user doesn't know or doesn't want to specify the
>search
>scope to be restricted in field "name" but anywhere in the index'ed
>documents?
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Raymond
>> The output is not visible to me because the mailing list strips
>images.
>> Please try a different way to show the output.
>> Cheers -- Rick
>>
>> On March 29, 2018 10:17:13 PM EDT, Raymond Xie <xie3208...@gmail.com>
>> wrote:
>> > I am new to Solr, following Steve Rowe's example on
>>
>>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
>> >
>> >It would be greatly appreciated if anyone can enlighten me where to
>> >start
>> >troubleshooting, thank you very much in advance.
>> >
>> >The steps I followed are:
>> >
>> >Here ya go << END_OF_SCRIPT
>> >
>> >bin/solr stop
>> >rm server/logs/*.log
>> >rm -Rf server/solr/films/
>> >bin/solr start
>> >bin/solr create -c films
>> >curl http://localhost:8983/solr/films/schema -X POST -H
>> >'Content-type:application/json' --data-binary '{
>> >"add-field" : {
>> >"name":"name",
>> >"type":"text_general",
>> >"multiValued":false,
>> >"stored":true
>> >},
>> >"add-field" : {
>> >"name":"initial_release_date",
>> >"type":"pdate",
>> >"stored":true
>> >}
>> >}'
>> >bin/post -c films example/films/films.json
>> >curl http://localhost:8983/solr/films/config/params -H
>> >'Content-type:application/json'  -d '{
>> >"update" : {
>> >  "facets": {
>> >"facet.field":"genre"
>> >}
>> >  }
>> >}'
>> >
>> ># END_OF_SCRIPT
>> >
>> >Additional fun -
>> >
>> >Add highlighting:
>> >curl http://localhost:8983/solr/films/config/params -H
>> >'Content-type:application/json'  -d '{
>> >"set" : {
>> >  "browse": {
>> >"hl":"on",
>> >"hl.fl":"name"
>> >}
>> >  }
>> >}'
>> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
>> >see "batman" highlighted in the results
>> >
>> >
>> >
>> >I got nothing in my search:
>> >
>> >
>> >
>> >
>> >**
>> >*Sincerely yours,*
>> >
>> >
>> >*Raymond*
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-01 Thread Rick Leir

Raymond
The output is not visible to me because the mailing list strips images. Please 
try a different way to show the output.
Cheers -- Rick

On March 29, 2018 10:17:13 PM EDT, Raymond Xie  wrote:
> I am new to Solr, following Steve Rowe's example on
>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
>
>It would be greatly appreciated if anyone can enlighten me where to
>start
>troubleshooting, thank you very much in advance.
>
>The steps I followed are:
>
>Here ya go << END_OF_SCRIPT
>
>bin/solr stop
>rm server/logs/*.log
>rm -Rf server/solr/films/
>bin/solr start
>bin/solr create -c films
>curl http://localhost:8983/solr/films/schema -X POST -H
>'Content-type:application/json' --data-binary '{
>"add-field" : {
>"name":"name",
>"type":"text_general",
>"multiValued":false,
>"stored":true
>},
>"add-field" : {
>"name":"initial_release_date",
>"type":"pdate",
>"stored":true
>}
>}'
>bin/post -c films example/films/films.json
>curl http://localhost:8983/solr/films/config/params -H
>'Content-type:application/json'  -d '{
>"update" : {
>  "facets": {
>"facet.field":"genre"
>}
>  }
>}'
>
># END_OF_SCRIPT
>
>Additional fun -
>
>Add highlighting:
>curl http://localhost:8983/solr/films/config/params -H
>'Content-type:application/json'  -d '{
>"set" : {
>  "browse": {
>"hl":"on",
>"hl.fl":"name"
>}
>  }
>}'
>try http://localhost:8983/solr/films/browse?q=batman now, and you'll
>see "batman" highlighted in the results
>
>
>
>I got nothing in my search:
>
>
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

2018-04-01 Thread Rick Leir

Raymond
Folks are quiet, maybe because of Easter.
Solr has a RESTful interface, and all the details are in the manual. Briefly, 
you need to create a JSON document containing all the fields in a FIX, then 
POST it to Solr. POST all your FIX's to Solr, perhaps in batches. Then search 
for a FIX using a GET request.  At this point you should be able to create a 
basic system, with a bit of reading in the manual,  and when you have something 
basic working then come back and we can refine it. Cheers -- Rick

On March 31, 2018 2:33:25 PM EDT, Walter Underwood  
wrote:
>Looks like Financial Information Exchange data, but, as Shawn says, the
>real problem is what you want to do with it.
>
>* What fields will be searched? Those are indexed.
>* What fields will be returned in the result? Those are stored.
>* What is the data type for each field?
>
>I often store the data for most of the fields because it makes
>debugging search problems so much easier.
>
>wunder
>Walter Underwood
>wun...@wunderwood.org
>http://observer.wunderwood.org/  (my blog)
>
>> On Mar 31, 2018, at 11:29 AM, Shawn Heisey 
>wrote:
>> 
>> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>>> I just started using Solr to create a Searching function on our
>existing
>>> data.
>>> 
>>> The existing data is in FIX format sample as below:
>> 
>>> all the red tags (I didn't mark all of them) are fields with
>definition
>>> from FIX standard, I need to create index on all the tags, how do I
>start?
>> 
>> I do not know what FIX means, and there are no colors in your email.
>> 
>> Can you elaborate?
>> 
>> Fine-tuning the schema can be one of the most time-consuming parts of
>setting up a Solr installation, and there are usually no easy quick
>answers.  Exactly what to do will depend not only on the data that
>you're indexing, but also what you want to do with it.
>> 
>> Thanks,
>> Shawn
>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

2018-03-31 Thread Rick Leir

Raymond
Will you be streaming the FIX data, perhaps with aggregation? Just a thought, I 
have no experience with FIX. Streaming opens up lots of questions.
Cheers -- Rick

On March 31, 2018 2:33:25 PM EDT, Walter Underwood  
wrote:
>Looks like Financial Information Exchange data, but, as Shawn says, the
>real problem is what you want to do with it.
>
>* What fields will be searched? Those are indexed.
>* What fields will be returned in the result? Those are stored.
>* What is the data type for each field?
>
>I often store the data for most of the fields because it makes
>debugging search problems so much easier.
>
>wunder
>Walter Underwood
>wun...@wunderwood.org
>http://observer.wunderwood.org/  (my blog)
>
>> On Mar 31, 2018, at 11:29 AM, Shawn Heisey 
>wrote:
>> 
>> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>>> I just started using Solr to create a Searching function on our
>existing
>>> data.
>>> 
>>> The existing data is in FIX format sample as below:
>> 
>>> all the red tags (I didn't mark all of them) are fields with
>definition
>>> from FIX standard, I need to create index on all the tags, how do I
>start?
>> 
>> I do not know what FIX means, and there are no colors in your email.
>> 
>> Can you elaborate?
>> 
>> Fine-tuning the schema can be one of the most time-consuming parts of
>setting up a Solr installation, and there are usually no easy quick
>answers.  Exactly what to do will depend not only on the data that
>you're indexing, but also what you want to do with it.
>> 
>> Thanks,
>> Shawn
>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Add remote ip address in solr log

2018-03-29 Thread Rick Leir

Vince
Something as simple as an Apache proxypass would help, then your Apache log 
would tell you.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr on HDInsight to write to Active Data Lake

2018-03-28 Thread Rick Leir

Hi,
The class that is not found is likely in the Azure related libraries. As Erick 
said, are you sure that you have a library containing it?
Cheers
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Some performance questions....

2018-03-23 Thread Rick Leir



Deep,
What is the test so I can try it. 

75 or 90 ms .. is that the JVM startup time?
Cheers -- Rick
>>
>>
>I have stated the numbers which I found during my test. The best way to
>verify them is for someone else to run the same test. Otherwise I don't
>see
>how we can verify the results


-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr on HDInsight to write to Active Data Lake

2018-03-23 Thread Rick Leir

Abhi
Check your lib directives. 
https://lucene.apache.org/solr/guide/6_6/lib-directives-in-solrconfig.html#lib-directives-in-solrconfig

I suspect your jars are not in a lib dir mentioned in solrconfig.xml
Cheers -- Rick

On March 23, 2018 11:12:17 AM EDT, Abhi Basu <9000r...@gmail.com> wrote:
>MS Azure does not support Solr 4.9 on HDI, so I am posting here. I
>would
>like to write index collection data to HDFS (hosted on ADL).
>
>Note: I am able to get to ADL from hadoop fs command like, so hadoop is
>configured correctly to get to ADL:
>hadoop fs -ls adl://
>
>This is what I have done so far:
>1. Copied all required jars to sol ext lib folder:
>sudo cp -f /usr/hdp/current/hadoop-client/*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f /usr/hdp/current/hadoop-client/lib/*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f /usr/hdp/current/hadoop-hdfs-client/*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f /usr/hdp/current/hadoop-hdfs-client/lib/*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f
>/usr/hdp/current/storm-client/contrib/storm-hbase/storm-hbase*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f /usr/hdp/current/phoenix-client/lib/phoenix*.jar
>/usr/hdp/current/solr/example/lib/ext
>sudo cp -f /usr/hdp/current/hbase-client/lib/hbase*.jar
>/usr/hdp/current/solr/example/lib/ext
>
>This includes the Azure active data lake jars also.
>
>2. Edited my solr-config.xml file for my collection:
>
>${solr.core.name}/data/
>
>class="solr.HdfsDirectoryFactory">
>name="solr.hdfs.home">adl://esodevdleus2.azuredatalakestore.net/clusters/esohadoopdeveus2/solr/
>  /usr/hdp/2.6.2.25-1/hadoop/conf
>name="solr.hdfs.blockcache.global">${solr.hdfs.blockcache.global:true}
>  true
>  1
> true
>  16384
>  true
>  true
>  16
>
>
>
>When this collection is deployed to solr, I see this error message:
>
>
>
>0
>2189
>
>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
>CREATEing SolrCore 'ems-collection_shard2_replica2':
>Unable to create core: ems-collection_shard2_replica2 Caused by: Class
>org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>foundorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
>CREATEing SolrCore 'ems-collection_shard2_replica1': Unable to create
>core: ems-collection_shard2_replica1 Caused by: Class
>org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>foundorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
>CREATEing SolrCore 'ems-collection_shard1_replica1': Unable to create
>core: ems-collection_shard1_replica1 Caused by: Class
>org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>foundorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
>CREATEing SolrCore 'ems-collection_shard1_replica2': Unable to create
>core: ems-collection_shard1_replica2 Caused by: Class
>org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
>
>
>
>
>Has anyone done this and can help me out?
>
>Thanks,
>
>Abhi
>
>
>-- 
>Abhi Basu

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Looking for design ideas

2018-03-18 Thread Rick Leir

Steve
Does a document have a different URL when it is in a personal DB? 

I suspect the easiest solution is to use just one index.

You can have a field containing an integer identifying the personal DB. For 
public, set this to zero. Call it DBid. Update the doc to change this and the 
URL when the user starts editing.

Then the query contains the userid, and you boost on this field. Or something 
like that.
Cheers -- Rick


On March 18, 2018 11:13:49 AM EDT, Steven White  wrote:
>Hi everyone,
>
>I have a design problem that i"m not sure how to solve best so I
>figured I
>share it here and see what ideas others may have.
>
>I have a DB that hold documents (over 1 million and growing).  This is
>known as the "Public" DB that holds documents visible to all of my end
>users.
>
>My application let users "check-out" one or more documents at a time
>off
>this "Public" DB, edit them and "check-in" back into the "Public" DB. 
>When
>a document is checked-out, it goes into a "Personal" DB for that user
>(and
>the document in the "Public" DB is flagged as such to alert other
>users.)
>The owner of this checked-out document in the "Personal" DB can make
>changes to the document and save it back into the "Personal" DB as
>often as
>he wants to.  Sometimes the document lives in the "Personal" DB for few
>minutes before it is checked-in back into the "Public" DB and sometimes
>it
>can live in the "Personal" DB for 1 day or 1 month.  When a document is
>saved into the "Personal" DB, only the owner of that document can see
>it.
>
>Currently there are 100 users but this will grow to at least 500 or
>maybe
>even 1000.
>
>I'm looking at a solution on how to enable a full text search on those
>documents, both in the "Public" and "Personal" DB so that:
>
>1) Documents in the "Public" DB are searchable by all users.  This is
>the
>easy part.
>
>2) Documents in the "Personal" DB of each user is searchable by the
>owner
>of that "Personal" DB.  This is easy too.
>
>3) A user can search both the "Public" and "Personal" DB at anytime but
>if
>a document is in the "Personal" DB, we will not search it the "Public"
>--
>i.e.: whatever is in "Personal" DB takes over what's in the "Public"
>DB.
>
>Item #3 is important and is what I'm trying to solve.  The goal is to
>give
>hits to the user on documents that they are editing (in their
>"Personal"
>DB) instead of that in the "Public".
>
>The way I'm thinking to solve this problem is to create 2 Solr indexes
>(do
>we call those "cores"?):
>
>1) The "Public" DB is indexed into the "Public" Solr index.
>
>2) The "Personal" DB is indexed into the "Personal" Solr index with a
>field
>indicating the owner of that document.
>
>With the above 2 indexes, I can now send the user's search syntax to
>both
>indexes but for the "Public", I will also send a list of IDs (those
>documents in the user's "Personal" DB) to exclude from the result set.
>This way, I let a user search both the "Public" and "Personal" DB as
>such
>the documents in the "Personal" DB are included in the search and are
>excluded from the "Public" DB.
>
>Did I make sense?  If so, is this doable?  Will ranking be effected
>given
>that I'm searching 2 indexes?
>
>Let me know what issues I might be overlooking with this solution.
>
>Thanks
>
>Steve

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Rick Leir

S
Were there errors in the logs just before recoveries?
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr on DC/OS ?

2018-03-14 Thread Rick Leir

Søren,
DC/OS installs on top of Ubuntu or RedHat, and it is used to coordinate many 
machines so they appear as a cluster. 

Solr needs to be on a single machine, or in the case of SolrCloud, on many 
machines. It has no need of the coordination which DC/OS provides. Solr depends 
on direct access to lots of memory, and if any coordination layer attempts to 
mediate access to the memory then Solr would slow down. I recommend you install 
Solr directly on Ubuntu or Redhat or Windows Server (Disclosure: I know very 
little about DC/OS)
Cheers -- Rick

On March 14, 2018 6:19:22 AM EDT, "Søren"  wrote:
>Hi, has anyone experience in running solr on DC/OS?
>
>If so, how is that achieved succesfully? Solr is not in Universe.
>
>Thanks in advance,
>Soren

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

2018-03-14 Thread Rick Leir

Jay
Did you try using text_en_splitting copied out of another release? 
Though if someone went to the trouble of removing it from the example, there 
could be something broken in it. 
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to store files larger than zNode limit

2018-03-14 Thread Rick Leir

Could you manage userdict using Puppet or Ansible? Or whatever your automation 
system is. 
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: How to store files larger than zNode limit

2018-03-14 Thread Rick Leir

Markus, Atita
We set it higher too. 

When zk is recovering from a disconnected state it re-sends all the messages 
that it had been trying to send while the machines were disconnected. Is this 
stored in a ' transaction log' .tlog file? I am not clear on this. Zk also goes 
through the unsent messages when Solr starts up, and startup can take a while 
longer.

With this in mind, it might make more sense to use zk for kbyte sized  blobs. 
But machines are faster every year, so maybe Meg and Gig blobs will be 
appropriate. 
Cheers -- Rick

On March 13, 2018 5:56:56 PM EDT, Markus Jelsma  
wrote:
>Hi - For now, the only option is to allow larger blobs via
>jute.maxbuffer (whatever jute means). Despite ZK being designed for kb
>sized blobs, Solr demands us to abuse it. I think there was a ticket
>for compression support, but that only stretches the limit.
>
>We are running ZK with 16 MB for maxbuffer. It holds the large
>dictionaries, it runs fine. 
>
>Regards,
>Markus
> 
>-Original message-
>> From:Atita Arora 
>> Sent: Tuesday 13th March 2018 22:38
>> To: solr-user@lucene.apache.org
>> Subject: How to store files larger than zNode limit
>> 
>> Hi ,
>> 
>> I have a use case supporting multiple clients and multiple languages
>in a
>> single application.
>> So , In order to improve the language support, we want to leverage
>the Solr
>> dictionary (userdict.txt) files as large as 10MB.
>> I understand that ZooKeeper's default zNode file size limit is 1MB.
>> I'm not sure sure if someone tried increasing it before and how does
>that
>> fares in terms of performance.
>> Looking at -
>https://zookeeper.apache.org/doc/r3.2.2/zookeeperAdmin.html
>> It states -
>> Unsafe Options
>> 
>> The following options can be useful, but be careful when you use
>them. The
>> risk of each is explained along with the explanation of what the
>variable
>> does.
>> jute.maxbuffer:
>> 
>> (Java system property:* jute.maxbuffer*)
>> 
>> This option can only be set as a Java system property. There is no
>> zookeeper prefix on it. It specifies the maximum size of the data
>that can
>> be stored in a znode. The default is 0xf, or just under 1M. If
>this
>> option is changed, the system property must be set on all servers and
>> clients otherwise problems will arise. This is really a sanity check.
>> ZooKeeper is designed to store data on the order of kilobytes in
>size.
>> I would appreciate if someone has any suggestions  on what are the
>best
>> practices for handling large config/dictionary files in ZK?
>> 
>> Thanks ,
>> Atita
>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Navigation/Paging

2018-03-13 Thread Rick Leir

Sebastien
Can you not just handle this in your Javascript? Your request will always get 
15 rows, start=0 then start=15 and so on. In the details view you only show one 
of the documents of course, and when the user is viewing the last of 15 and  
clicks next, you will request the next 15.
When viewing the first of the 15, click previous, you will request the previous 
15. 
Am I missing something here?
Rick

On March 13, 2018 12:26:18 PM EDT, Sebastian Riemer  wrote:
>Hi,
>
>In our web app, when displaying result lists from solr,  we've
>successfully introduced paging via the params 'start' and 'rows' and
>it's working quite well.
>
>Our navigation in list screens look like this:
>
>
><< First   < Prev   1 - 15 of 62181   Next
>>
>Last
>>>
>
>One can navigate to the first page, previous page, next page and last
>page. All is done via adapting the param "start" accordingly by simply
>adding the page size.
>
>However, now we want to introduce a similar navigation in our detail
>views, where only ever one document is displayed. Again, the navigation
>bar looks like this:
>
><< First   < Prev   1 - 15 of 62181   Next
>>
>Last
>>>
>
>But now, Prev / Next shall open up the previous / next _document_
>instead of the next page. The same goes for First and Last, it shall
>open the first / last _document_ not the page.
>
>Our first approach to this was to simply add the param "fl=id" so we
>only get the IDs of documents and set page size to ALL (i.e. no
>restriction on param "rows"). That way, it was easy to extract the
>current document id from the result list, and check which id was
>preceding and succeeding the current id, as well as getting the very
>first id and the very last id, in order to render the navigation bar.
>
>This lead to solr being heavily under load since it must load 62181
>documents (in this example) in order to return the ids. I somehow
>thought this would be easy for solr to do, but it isn't.
>
>Our second approach was, to simply keep the same value for params
>"start" and "rows" since the user is always selecting a document from
>the list - thus the selected document already is within the page.
>However, the edge cases are, the selected document is the very first on
>the page or the very last one, thus the previous or next document id is
>not within the page result from solr -> I guess this we could handle by
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly.
>
>However I would not know how to retrieve the id of the very first
>document and the very last document (except for executing separate
>queries with I guess start=0, rows=1 and start=62181 and rows=1)
>
>TL,DR:
>For any query and a documentId (of which it is known it is within the
>query result), what is a simple and efficient enough way, to get the
>following navigational information:
>
>-  Previous document Id
>
>-  Next document id
>
>-  First document id
>
>-  Last document id
>
>Can this sort of requirement be handled within one solr query? Should I
>user cursorMark in this scenario?
>
>Best regards,
>
>Sebastian

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Altering the query if query contains all stopwods

2018-03-09 Thread Rick Leir

Tav, Ryan
Now you have me wondering, should it be returning *:* or some general landing 
page.

Suppose you had typeahead or autocomplete, it should ignore any stopwords list.

By the way, people on this list have had good reasons why we should stop using 
stopwords.
Cheers -- Rick

On March 9, 2018 1:13:22 PM EST, tapan1707  wrote:
>Hello Ryan,
>Solr has a Filter class called solr.SuggestStopFilterFactory, which
>basically works similar to solr.StopFilterFactory but with a slight
>modification that if all of the words are present in stopwords.txt then
>it
>won't remove the last one. 
>I am not sure about wildcard search but if all of the query tokens are
>stopwords.txt then at the very least it won't be returning the zero
>results.(assuming that search results for the last word exists)  
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Indexing nested json

2018-03-08 Thread Rick Leir

Hi James
Yonick has a great blog explaining that, but I am on the bus so I do not have a 
link for you.

Yes, you can use nesting, and there are good reasons for doing so, but you will 
find it much easier to use flat fields. YMMV
Cheers -- Rick


On March 8, 2018 5:22:13 PM EST, "kasinger, james" 
 wrote:
>Not quite. This will index the nested json into a flattened
>representation of the data, in multiple documents. We expect the
>resulting document to contain all the same nested fields as the json
>had. It should be identical. 
>
>Thanks for your response,
>Jams
>
>On 3/8/18, 1:26 PM, "Mikhail Khludnev"  wrote:
>
>Will
>https://lucene.apache.org/solr/guide/7_1/transforming-and-indexing-custom-json.html
>work
>for you?
>
>On Thu, Mar 8, 2018 at 8:17 PM, kasinger, james <
>james.kasin...@nordstrom.com> wrote:
>
>> Hi folks,
>> Has anyone had success indexing nested json into solr? I know that
>solr
>> prefers a flattened representation of the data, but I’m exploring
>potential
>> solutions or workarounds for achieving this. Thanks in advance.
>>
>> For instance I’m indexing this “document” and expect it to be
>presented in
>> solr in the same way.
>>
>> {
>> "rolledupcolors": [
>>  {
>> "Name": "BURGUNDY",
>> "ManiImageUrl":"1/_102069221.jpg",
>> "AltImageUrl":"3/_102067603.jpg",
>> "RGB":"",
>> "ColorFamily":"Red,Purple",
>> "SwatchImageUrl":"2/_102067602.jpg"
>>   },
>>   {
>>   "Name": "CHARCOAL",
>>
>>   }
>>   ]
>> }
>>
>> James
>>
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Rick Leir

David
Yes, highlighting is tricky, especially with synonyms. Sorry, I would need to 
see a bit more of your config before saying more about it.
Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Rick Leir

David
When you have "lcx__balmoral__cannum__clear_lake__lower_norton" in a field, 
would you search for *cannum* ? That might not perform well. 
Why not have a multivalue field for this information? 

It could be that you have a good reason for this, and I just do not understand.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Atomic updates using solr-php-client

2018-03-06 Thread Rick Leir

Sami
Why not do the simple case first, with complete document updates. When you have 
that working, you can decide if you want atomic updates too.
Cheers -- Rick

On March 6, 2018 2:26:50 AM EST, Sami al Subhi  wrote:
>Thank you for replying,
>
>Yes that is the one. Unfortunately there is no documentation for this
>library.
>
>I tried to implement other libraries but I couldn't get them running.
>This
>is the easiest library to implement but lacks support and
>documentation.
>
>Thank you and best regards,
>Sami
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Need a Query syntax for fetching results

2018-03-06 Thread Rick Leir

Hi Raj
Maybe this would be what you need.
"Keyword Tokenizer
This tokenizer treats the entire text field as a single token."
There used to be an example showing the use of this in schema.xml, but I am 
away from my computer so it is hard to check.
And everything Emir says is spot-on.
Then you might want to go further with ngrams or a spelling check so the user 
need not be perfect.
Cheers -- Rick


On March 6, 2018 5:40:02 AM EST, "Emir Arnautović" 
 wrote:
>Hi Raj,
>You need to get familiar with Solr analysis chain:
>https://lucene.apache.org/solr/guide/6_6/understanding-analyzers-tokenizers-and-filters.html
>
>
>When playing with it, use admin console analysis tab to see what tokens
>are produced.
>
>And you need to understand your search requirements and cover them with
>one or more fields. Note that you can use copyField to index the same
>content in different ways to handle different search requirements.
>
>It is probably not what you want, but based on what you described, you
>do not care about anything but the first token in your field, so you
>can use LimitTokenCountFilter to index only the first token. In query
>analysis you do not use it and with default operator OR you will get
>what you want.
>
>HTH,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 6 Mar 2018, at 10:55, Rajvinder Pal 
>wrote:
>> 
>> Hi ,
>> I am new to Lucene. I have a requirement where when i request the
>> organization name, it should show the matching organization names.
>> 
>> I have written the q param as
>> 
>> orgname_text: ABC test
>> 
>> it is returning the result :-
>> 
>> ABC test limited
>> ABC XYZ limited
>> DEF ABC limted
>> test limited
>> 
>> I want all the matching result which starts  with either ABC or test.
>so
>> here i dont want DEF ABC limited. Please let me know what feature or
>> syntax  i should use to get the required result.
>> 
>> Thanks
>> Raj

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Alias field names when searching (not for results)

2018-03-06 Thread Rick Leir

Christopher
The first thing that came to mind is that you are planning not to have an app 
in front of Solr. Without a web app, you will need to trust whoever can get 
access to Solr. Maybe you are on an intranet.
Thanks -- Rick

On March 6, 2018 2:42:26 AM EST, "Emir Arnautović" 
 wrote:
>Hi,
>I did not try it, but the first thing that came to my mind is to use
>edismax’s ability to define field aliases, something like
>f.f1.fq=field_1. Note that it is not recommended to have field name
>starting with number so not sure if it will work with “1”.
>
>HTH,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 5 Mar 2018, at 17:51, Christopher Schultz
> wrote:
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>> 
>> All,
>> 
>> I'd like for users to be able to search a field by multiple names
>> without performing a "copy-field" when analyzing a document. Is that
>> possible? Whenever I search for "solr alias field" I get results
>about
>> how to re-name fields in the results.
>> 
>> Here's what I'd like to do. Let's say I have a document:
>> 
>> {
>>  id: 1234,
>>  field_1: valueA,
>>  field_2: valueB,
>>  field_3: valueC
>> }
>> 
>> I'd like users to be able to find this document using any of the
>> following queries:
>> 
>>   field_1:valueA
>>   f1:valueA
>>   1:valueA
>> 
>> I just want the query parser to say "oh, 'f1' is an alias for
>> 'field_1'" and substitute that when performing the search. Is that
>> possible?
>> 
>> - -chris
>> 
>> -BEGIN PGP SIGNATURE-
>> Comment: GPGTools - http://gpgtools.org
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> 
>> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddZMdHGNocmlzQGNo
>> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgbFg/9HIgJgX4Lib2X4XYU
>> P2F4uW9TyDWtp6mA9xsfdYxRNe4K3yFPbkUUwJW2MI2V62SR6apB+TghOMqbmCD/
>> gaQ0CFWgLsn5Egulj2taUN+MAYD/4GMO9ltyXNc2g9siSMIDUS5N09fwJbxfBXrP
>> SPvSQqUOVD5wKCgoCXCVd+RM+SEClX4k1ZuWDbVAiO4YPpJwFy6+BN2uTCaqP3Ll
>> XOqn+/6ejnPCcvoQrTlE1/DiBTUti8H7V0LOjzEZns8YqZOAH+pAVxYRRQM5UzZS
>> pUBGpHokoaZ0tMf/aCmHp5pI5VWrxrXcS47csBRvoAn8Z7uRxH8p0wYE8BkGs2rw
>> dEzOSOKdhma11ZDkWKg2/sBw8v9swyWy9W3MuA0tqYzfZicsXT2GBHzyPDsqabDq
>> mBPWuxUdqZEaz+fE8SRsW84ELcqe1fbltscng/ZhNRkLOtmn6aeMc+XABhpcVE7o
>> Rfodl/PrQetgzZ4WLyzb7m2bz2w38x6WSPhuQIZHVrHNoCXG+gWY3zMxF6EBEFCV
>> CJvsXaQ1ZpGLjO/uCXJ9iHKxsSoUzWap9qws82xH3QJ52Q7vCoxF5G/2MZWvvgje
>> +MsZbh8L5D0HBM1jTKWx3X+r3FbdURu6P8yUFD/Hywy2J/jev1MiU4Zh3Yw+JByo
>> mR8TdvleHAHfA01tArVgk2yscqI=
>> =44DX
>> -END PGP SIGNATURE-

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Need help with match contains query in SOLR

2018-03-01 Thread Rick Leir

Hi
Would a pf2 boost suit your needs? You would match loosely on any term, and 
your results containing bigrams would be at the top. 
HTH -- Rick

On March 1, 2018 11:54:19 AM EST, bbarani  wrote:
>Hi,
>
>I want to do a complete "phrase contain" match.
>
>For ex:  Value is stored as below in the multivalued field
>
>1
>
>transfer responsibility
>transfer account
>
>
>
>*Positive cases: (when it should return this document)*
>searchTerms:how to transfer responsibility
>searchTerms:show me ways to transfer responsibility
>searchTerms:show me ways to transfer account
>searchTerms:what is transfer accoun
>
>
>*Negative cases: (when it should not return this document)*
>searchTerms:how to transfer
>searchTerms:what is responsibility
>searchTerms:what is account
>searchTerms:what is transfer
>searchTerms:what is transfer and responsibility
>searchTerms:what is transfer and account
>searchTerms:what is account with responsibility
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: SOLR Similarity Difference

2018-02-27 Thread Rick Leir

Rick
Did you experiment in the SolrAdmin analysis page? It would possibly tell you 
whether your chain is doing what you expect. Then you need to consider that 
boolean logic is not strictly boolean in Solr. There is a Lucidworks blog which 
explains this nicely; every now and then someone posts the link here.
Cheers -- Rick

On February 26, 2018 5:39:31 PM EST, "Hodder, Rick"  wrote:
>I'm converting SOLR 4.10.2 to SOLR 7.1
>
>I have the following three strings in both SOLR cores
>
>Action Technical Temporaries t/a CTR Corporation
>Action Technical Temporaries
>Action Technical Temporar
>
>If I search
>
>IDX_CompanyName: (Action AND Technical AND Temporaries AND t/a AND CTR
>AND Corporation)
>
>Under 4.10.2 I see all three in the results
>
>Under 7.1, with the default BF25 similarity. I only see the first
>result
>
>Someone on the list suggested that make 7.1 to go back to the
>similarity factory used in 4.10.2 that I add the following to the
>schema.xml.
>
>class="org.apache.solr.search.similarities.ClassicSimilarityFactory">
>
>That brings all three results.
>
>But my boss would prefer that we don't use the older similarity
>factory.
>
>Is there some setting other than similarity factory that will make 7.1
>include these documents without changing the query?
>
>Thanks,
>
>Rick Hodder
>Information Technology
>Navigators Management Company, Inc.
>83 Wooster Heights Road, 2nd Floor
>Danbury, CT  06810
>(475) 329-6251
>
>[Forbes_Best Places Logo2016]

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Rick Leir

Dan,
Lowercase filter before the tokenizer?
Cheers -- Rick

On February 23, 2018 6:08:27 AM EST, "Dan ."  wrote:
>Hi,
>
>The StandardTokenizerFactory splits strings like 'JavaScript' into
>'Java'
>and 'Script', but then searches with 'javascript' do not match the
>document.
>
>Is there a solr way to prevent StandardTokenizer from splitting
>mixedcase
>strings?
>
>Cheers,
>Dan

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Object not fetched because its identifier appears to be already in processing

2018-02-23 Thread Rick Leir

Ven,
Where do you see that message? 
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Rick Leir

Tom
I think you are saying that all updates fail? Need to do a bit of 
troubleshooting. How about queries? What else is in the logs?
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Filesystems supported by Solr

2018-02-20 Thread Rick Leir

Ritesh
The filesystems you mention are used by Spark so it can stream huge quantities 
of data (corrections please).

By comparison, Solr uses a more 'reasonable' sized filesystem, but needs enough 
memory that all the index data can be resident. The regular Linux ext3 or ext4 
is fine.

If you are integrating Solr with Spark, then the filesystems you mention would 
be for Spark not Solr. 
Cheers -- Rick

On February 20, 2018 5:22:33 PM EST, Ritesh Chaman  
wrote:
>Hi team
>
>May I know what all filesystems are supported by Solr. For eg
>ADLS,WASB, S3
>etc. Thanks.
>
>Ritesh

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Rick Leir

Nandan
Work backwards from your results screen. When a user has done a search, what 
information would you like to appear on the screen?

That tells you what your Solr document needs to contain. How will you get that 
information into the Solr document? You will do the SQL select(s) as necessary, 
get the info from MySQL,  build a flat JSON record containing all that info for 
one document, and POST it to Solr. Repeat for all documents. Do a commit. 
Sorry, I Left out all the details! Cheers -- Rick

On February 17, 2018 12:56:59 PM EST, "@Nandan@" 
 wrote:
>Hi David ,
>Thanks for your reply.
>My few questions are :-
>1) I have to denormalize my MySQL data manually or some process is
>there.
>2) is it like when Data will insert into my MySQL  , it has to auto
>index
>into solr ?
>
>Please explain these .
>Thanks
>
>On Feb 18, 2018 1:51 AM, "David Hastings"  wrote:
>
>> Your first step is to denormalize your data into a flat data
>structure.
>> Then index that into your solr instance. Then you’re done
>>
>> On Feb 17, 2018, at 12:16 PM, @Nandan@
>> > wrote:
>>
>> Hi Team,
>> I am working on one e-commerce project in which my data is storing
>into
>> MySQL DB.
>> As currently we are using mysql search but planning to implement Solr
>> search to provide our customers more facilities.
>> Just for development purpose ,I am trying to do experiments into
>localhost.
>> Please guide me how can I Achieve it. Please provide some information
>links
>> which I can refer to learn more in details from scratch.
>>
>> Thanks and Best Regards,
>> Nandan Priyadarshi
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Using SolrJ for digest authentication

2018-01-31 Thread Rick Leir

Eddy
Maybe your request is getting through twice. Check your logs to see.
Cheers -- Rick

On January 31, 2018 5:59:53 AM EST, ddramireddy  wrote:
>We are currently deploying Solr in war mode(Yes, recommendation is not
>war.
>But this is something I can't change now. Planned for future). I am
>setting
>authentication for solr. As Solr provided basic authentication is not
>working in Solr 6.4.2, I am setting up digest authentication in tomcat
>for
>Solr. I am able to login into Solr admin application using credentials.
>
>Now from my Java application, when I try to run a query, which will
>delete
>documents in a core, it's throwing following error.
>
>org.apache.http.client.NonRepeatableRequestException: Cannot retry
>request
>with a non-repeatable request entity
>
>I can see in HttpSolrClient, we are setting only basic authentication.
>But,
>I am using Digest auth. Did anyone faced this error before??
>
>This is my code:
>
>public static void main(String[] args) throws ClassNotFoundException,
>SQLException, InterruptedException, IOException, SolrServerException {
>HttpSolrClient solrClient = getSolrHttpClient("solr",
>"testpassword");
>
>try {
>solrClient.deleteByQuery("account", "*:*");
>solrClient.commit("account");
>} catch (final SolrServerException | IOException exn) {
>throw new IllegalStateException(exn);
>}
>}
>
>private static HttpSolrClient getSolrHttpClient(final String userName,
>final
>String password) {
>
>final HttpSolrClient solrClient = new HttpSolrClient.Builder()
>  .withBaseSolrUrl("http://localhost:9000/solr/index.html;)
>.withHttpClient(getHttpClientWithSolrAuth(userName,
>password))
>.build();
>
>return solrClient;
>}
>
>private static HttpClient getHttpClientWithSolrAuth(final String
>userName, final String password) {
>   final CredentialsProvider provider = new BasicCredentialsProvider();
>final UsernamePasswordCredentials credentials
>= new UsernamePasswordCredentials(userName, password);
>provider.setCredentials(AuthScope.ANY, credentials);
>
>
>return HttpClientBuilder.create()
>.addInterceptorFirst(new PreemptiveAuthInterceptor())
>.setDefaultCredentialsProvider(provider)
>.build();
>
>}
>
>
>static class PreemptiveAuthInterceptor implements
>HttpRequestInterceptor
>{
>
>DigestScheme digestAuth = new DigestScheme();
>
>PreemptiveAuthInterceptor() {
>
>}
>
>@Override
>   public void process(final HttpRequest request, final HttpContext
>context)
>throws HttpException, IOException {
>final AuthState authState = (AuthState)
>context.getAttribute(HttpClientContext.TARGET_AUTH_STATE);
>
>  if (authState != null && authState.getAuthScheme() == null) {
>final CredentialsProvider credsProvider =
>(CredentialsProvider)
>context.getAttribute(HttpClientContext.CREDS_PROVIDER);
>final HttpHost targetHost = (HttpHost)
>context.getAttribute(HttpCoreContext.HTTP_TARGET_HOST);
> final Credentials creds = credsProvider.getCredentials(new
>AuthScope(targetHost.getHostName(), targetHost.getPort(), "Solr",
>"DIGEST"));
>if (creds == null) {
>System.out.println("No credentials for preemptive
>authentication");
>}
>digestAuth.overrideParamter("realm", "Solr");
>digestAuth.overrideParamter("nonce", Long.toString(new
>Random().nextLong(), 36));
>AuthCache authCache = new BasicAuthCache();
>authCache.put(targetHost, digestAuth);
>
>// Add AuthCache to the execution context
>   HttpClientContext localContext = HttpClientContext.create();
>localContext.setAuthCache(authCache);
>
>  request.addHeader(digestAuth.authenticate(creds, request,
>localContext));
>} else {
>System.out.println("authState is null. No preemptive
>authentication.");
>}
>}
>}
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-01-31 Thread Rick Leir

Luigi
Is there a reason for not indexing all of your on-disk pages? That seems to be 
the first step. But I do not understand what your goal is.
Cheers -- Rick

On January 30, 2018 1:33:27 PM EST, Luigi Caiazza  wrote:
>Hello,
>
>I am working on a project that simulates a selective, large-scale
>crawling.
>The system adapts its behaviour according with some external user
>queries
>received at crawling time. Briefly, it analyzes the already crawled
>pages
>in the top-k results for each query, and prioritizes the visit of the
>discovered links accordingly. In a generic experiment, I measure the
>time
>units as the number of crawling cycles completed so far, i.e., with an
>integer value. Finally, I evaluate the experiment by analyzing the
>documents fetched over the crawling cycles. In this work I am using
>Lucene
>7.2.1, but this should not be an issue since I need just some
>conceptual
>help.
>
>In my current implementation, an experiment starts with an empty index.
>When a Web page is fetched during the crawling cycle *x*, the system
>builds
>a document with the URL as StringField, the title and the body as
>TextFields, and *x* as an IntPoint. When I get an external user query,
>I
>submit it  to get the top-k relevant documents crawled so far. When I
>need
>to retrieve the documents indexed from cycle *i* to cycle *j*, I
>execute a
>range query over this last IntPoint field. This strategy does the job,
>but
>of course the write operations take some hours overall for a single
>experiment, even if I crawl just half a million of Web pages.
>
>Since I am not crawling real-time data, but I am working over a static
>set
>of many billions of Web pages (whose contents are already stored on
>disk),
>I am investigating some opportunities to reduce the number of writes
>during
>an experiment. For instance, I could avoid to index everything from
>scratch
>for each run. I would be happy to index all the static contents of my
>dataset (i.e., URL, title and body of a Web page) once and for all.
>Then,
>for a single experiment, I would mark a document as crawled at cycle
>*x* without
>storing this information permanently, in order both to filter out the
>documents that in the current simulation have not been crawled when
>processing the external queries, and to still perform the range queries
>at
>evaluation time. Do you have any idea on how to do that?
>
>Thank you in advance for your support.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: SolrCloud installation troubles...

2018-01-29 Thread Rick Leir

SELinux? Number open File limits? Number of Process limits? 
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: pf2

2018-01-26 Thread Rick Leir

Emir
Sow=false .. thanks for this! 

The problem seems to be due to a stopword. Everything is fine when I avoid 
stopwords in my query. The stopword might get removed in the query matching, 
but I would need to allow some slop perhaps for pf2.
Thanks 
Rick

On January 26, 2018 8:14:06 AM EST, "Emir Arnautović" 
<emir.arnauto...@sematext.com> wrote:
>Hi Rick,
>It does not work in any case or it does not work for some cases - e.g.
>something like l’avion? Maybe you can try use sow=false and see if it
>will help.
>
>Cheers,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 26 Jan 2018, at 13:38, Rick Leir <rl...@leirtech.com> wrote:
>> 
>> Emir
>> Thanks, I will do when I get off this bus.
>> 
>> I have run the text thru the SolrAdmin Analyzer, it looks fine.
>> 
>> According to the debugQuery output, individual words match in the qf,
>but not the pair that pf2 should match.
>> 
>> I compare the configs for English and French, and they are the same
>apart from the analysis chain which is below. Only French fails. I will
>take out filters one by one and attempt to find which is causing this.
>> Cheers -- Rick
>> 
>> On January 26, 2018 4:09:51 AM EST, "Emir Arnautović"
><emir.arnauto...@sematext.com> wrote:
>>> Hi Rick,
>>> Can you include sample of your query and text that should match.
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training -
>http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 25 Jan 2018, at 23:13, Rick Leir <rl...@leirtech.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> Hi all
>>>> My pf2 keywords^11.0 works for english not for french. Here are the
>>> fieldtypes, actually from two schema.xml's in separate cores. Solr
>>> 5.2.2, edismax, q.op AND
>>>> I suspect there are several problems with the french schema. Maybe
>I
>>> only needed to show the query analyzer, not the index analyzer?
>>>> 
>>>> The pf2 does not show a match in the debugQuery=true output for the
>>> French. However, a qf keywords^10.0 does show a match. The keywords
>>> field is copyfielded into text, which is the df. Is there any other
>>> field I should be showing?
>>>> Thanks
>>>> Rick
>>>> 
>>>> >> positionIncrementGap="100">
>>>> 
>>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>>  
>>>>  >> ignoreCase="true" synonyms="synonyms.txt"
>>> tokenizerFactory="solr.StandardTokenizerFactory"/>
>>>>  >> words="lang/stopwords_en.txt"/>
>>>>  
>>>>  
>>>>  >> protected="protwords.txt"/>
>>>>  >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/>
>>>>  
>>>>  language="English"
>>> />
>>>>  
>>>> 
>>>> 
>>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>> 
>>>>  >> words="lang/stopwords_en.txt"/>
>>>>  
>>>>  
>>>>  >> protected="protwords.txt"/>
>>>>  >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/>
>>>>  
>>>>  language="English"
>>> />
>>>>  
>>>> 
>>>> 
>>>> 
>>>> >> positionIncrementGap="100">
>>>> 
>>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>>  
>>>>  >> ignoreCase="true" synonyms="synonyms.txt"
>>> tokenizerFactory="solr.StandardTokenizerFactory"/>
>>>>  >> articles="lang/contractions_fr.txt"/>
>>>>  
>>>>  >> ignoreCase="true" words="lang/stopwords_fr.txt"/>
>>>>  
>>>>  >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/>
>>>>  
>>>> 
>>>> 
>>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>>  
>>>>  >> articles="lang/contractions_fr.txt"/>
>>>>  
>>>>  >> ignoreCase="true" words="lang/stopwords_fr.txt"/>
>>>>  
>>>>  >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/>
>>>>  
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>> 
>> -- 
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: solr usage reporting

2018-01-26 Thread Rick Leir

Becky,
There are excellent log analysis systems. Logstash? Awstats? I do not think 
Solr should do this. Some people index their logs into a separate Solr core for 
analysis, but it might be a challenge to do this in a useful way.
Cheers -- Rick

On January 25, 2018 2:56:01 PM EST, Becky Bonner  wrote:
>That would work for a single server but collecting the logs from the
>farm would be a problematic since we would have logs from all nodes and
>replicas from all the members of the farm.  We would then need weed out
>what we are interested in and combine. It would be better if there were
>a way to query it within Solr.  I think something in Solr would be best
>... a separate collection that can be queried and reports generated
>from it.  The log does have the basic info we need though.
>
>
>-Original Message-
>From: Marco Reis [mailto:m...@marcoreis.net] 
>Sent: Thursday, January 25, 2018 11:14 AM
>To: solr-user@lucene.apache.org
>Subject: Re: solr usage reporting
>
>One way is to collect the log from your server and, then, use other
>tool to generate your report.
>
>
>On Thu, Jan 25, 2018 at 2:59 PM Becky Bonner 
>wrote:
>
>> Hi all,
>> We are in the process of replacing our Google Search Appliance with 
>> SOLR
>> 7.1 and are needing one last piece of our requirements.  We provide a
>
>> monthly report to our business that shows the top 1000 query terms 
>> requested during the date range as well as the query terms requested 
>> that contained no results.  Is there a way to log the requests and 
>> later query solr for these results? Or is there a plugin to add this
>functionality?
>>
>> Your help appreciated.
>> Bcubed
>>
>>
>> --
>Marco Reis
>Software Engineer
>http://marcoreis.net
>https://github.com/masreis
>+55 61 9 81194620

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: pf2

2018-01-26 Thread Rick Leir

Emir
Thanks, I will do when I get off this bus.

I have run the text thru the SolrAdmin Analyzer, it looks fine.

According to the debugQuery output, individual words match in the qf, but not 
the pair that pf2 should match.

I compare the configs for English and French, and they are the same apart from 
the analysis chain which is below. Only French fails. I will take out filters 
one by one and attempt to find which is causing this.
Cheers -- Rick

On January 26, 2018 4:09:51 AM EST, "Emir Arnautović" 
<emir.arnauto...@sematext.com> wrote:
>Hi Rick,
>Can you include sample of your query and text that should match.
>
>Thanks,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 25 Jan 2018, at 23:13, Rick Leir <rl...@leirtech.com> wrote:
>> 
>> 
>> 
>> Hi all
>> My pf2 keywords^11.0 works for english not for french. Here are the
>fieldtypes, actually from two schema.xml's in separate cores. Solr
>5.2.2, edismax, q.op AND
>> I suspect there are several problems with the french schema. Maybe I
>only needed to show the query analyzer, not the index analyzer?
>> 
>> The pf2 does not show a match in the debugQuery=true output for the
>French. However, a qf keywords^10.0 does show a match. The keywords
>field is copyfielded into text, which is the df. Is there any other
>field I should be showing?
>> Thanks
>> Rick
>> 
>> positionIncrementGap="100">
>> 
>>   mapping="mapping-ISOLatin1Accent.txt"/>
>>   
>>   ignoreCase="true" synonyms="synonyms.txt"
>tokenizerFactory="solr.StandardTokenizerFactory"/>
>>   words="lang/stopwords_en.txt"/>
>>   
>>   
>>   protected="protwords.txt"/>
>>   dictionary="lang/stemdict_en.txt" ignoreCase="true"/>
>>   
>>   />
>>   
>> 
>> 
>>   mapping="mapping-ISOLatin1Accent.txt"/>
>>  
>>   words="lang/stopwords_en.txt"/>
>>   
>>   
>>   protected="protwords.txt"/>
>>   dictionary="lang/stemdict_en.txt" ignoreCase="true"/>
>>   
>>   />
>>   
>> 
>> 
>> 
>> positionIncrementGap="100">
>> 
>>   mapping="mapping-ISOLatin1Accent.txt"/>
>>   
>>   ignoreCase="true" synonyms="synonyms.txt"
>tokenizerFactory="solr.StandardTokenizerFactory"/>
>>   articles="lang/contractions_fr.txt"/>
>>   
>>   ignoreCase="true" words="lang/stopwords_fr.txt"/>
>>   
>>   dictionary="lang/stemdict_fr.txt" ignoreCase="true"/>
>>   
>> 
>> 
>>   mapping="mapping-ISOLatin1Accent.txt"/>
>>   
>>   articles="lang/contractions_fr.txt"/>
>>   
>>   ignoreCase="true" words="lang/stopwords_fr.txt"/>
>>   
>>   dictionary="lang/stemdict_fr.txt" ignoreCase="true"/>
>>   
>> 
>> 
>> 
>> -- 
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

pf2

2018-01-25 Thread Rick Leir



Hi all
My pf2 keywords^11.0 works for english not for french. Here are the fieldtypes, 
actually from two schema.xml's in separate cores. Solr 5.2.2, edismax, q.op AND
I suspect there are several problems with the french schema. Maybe I only 
needed to show the query analyzer, not the index analyzer?

The pf2 does not show a match in the debugQuery=true output for the French. 
However, a qf keywords^10.0 does show a match. The keywords field is 
copyfielded into text, which is the df. Is there any other field I should be 
showing?
Thanks
Rick



   
   
   
   
   
   
   
   
   
   
   


   
  
   
   
   
   
   
   
   
   





   
   
   
   
   
   
   
   
   


   
   
   
   
   
   
   
   



-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr 7.2.1 - cursorMark and elevateIds

2018-01-25 Thread Rick Leir


Greg

Does the CursorMark run correctly on its own, with no elevate?

cheers -- Rick


On 01/23/2018 08:36 PM, Greg Roodt wrote:

Hi

I'm trying to use the Query Eleveation Component in conjunction with
CursorMark pagination. It doesn't seem to work. I get an exception. Are
these components meant to work together?

This works:
enableElevation=true=true=MAAMNqFV1dg

This fails:
cursorMark=*=true=true=MAAMNqFV1dg

Here is the stacktrace:

"""
'trace'=>'java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.lucene.util.BytesRef at

Re: SOLR Data Backup

2018-01-22 Thread Rick Leir



.
>
>BTW, why do we not recommend having Solr as a source of truth?
>
One reason is that you might want to tune the analysis chain and then reindex.

Or your data gets progressively larger, and you want to be able to recover from 
an OOM during indexing. 
Rick

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr Exception: Undefined Field

2018-01-17 Thread Rick Leir

Deepak
Would you like to write your post again without asterisks? Include the 
asterisks which are necessary to the query of course.
Rick

On January 17, 2018 1:10:28 PM EST, Deepak Goel  wrote:
>*Hello*
>
>*In Solr Admin: I type the q parameter as - *
>
>*text_entry:**
>
>*It gives the following exception (In the schema I do see a field as
>text_entry):*
>
>{ "responseHeader":{ "zkConnected":true, "status":400, "QTime":2,
>"params":{
>"q":"text_entry:*", "_":"1516190134181"}}, "error":{ "metadata":[
>"error-class","org.apache.solr.common.SolrException",
>"root-error-class",
>"org.apache.solr.common.SolrException"], "msg":"undefined field
>text_entry",
>"code":400}}
>
>
>*However when i type the q paramter as -*
>
>*{!term f=text_entry}henry*
>
>*This does give out the output as foll:*
>
>{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":0,
>"params":{ "
>q":"{!term f=text_entry}henry", "_":"1516190134181"}},
>"response":{"numFound
>":262,"start":0,"docs":[ { "type":"line", "line_id":"80075",
>"play_name":"Richard
>II", "speech_number":"13", "line_number":"3.3.37", "speaker":"HENRY
>BOLINGBROKE", "text_entry":"Henry Bolingbroke", "id":
>"9428c765-a4e8-4116-937a-9b70e8a8e2de",
>"_version_":1588569205789163522, "
>speaker_str":["HENRY BOLINGBROKE"], "text_entry_str":["Henry
>Bolingbroke"],
>"line_number_str":["3.3.37"], "type_str":["line"],
>"play_name_str":["Richard
>II"]}, {
>**
>
>Any ideas what is going wrong in the first q?
>
>Thank You
>
>Deepak
>"Please stop cruelty to Animals, help by becoming a Vegan"
>+91 73500 12833
>deic...@gmail.com
>
>Facebook: https://www.facebook.com/deicool
>LinkedIn: www.linkedin.com/in/deicool
>
>"Plant a Tree, Go Green"
>
>
>Virus-free.
>www.avg.com
>
><#m_-480358672325756571_m_-3347175065213108175_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to implement the function of W/N in Solr?

2018-01-16 Thread Rick Leir

Xi
Might this be something you can solve with pf or pf2? Google solr pf will find 
this for you. Adjust the slop to allow for terms which are not immediately 
adjacent.
Rick

On January 15, 2018 3:04:40 AM EST, "xizhen.w...@incoshare.com" 
 wrote:
>Hello,
>
>I'm using Solr 4.10.3, and I want "A" and "B" are together, "C" and "D"
>are together, and the terms "B" and "C" are no more than 3 terms away
>from each other, by using {!surround} 3w("A B", "C D"), but it doesn't
>work.  Is there any other useful way?
>
>Any help is appreciated.
>
>
>
>xizhen.w...@incoshare.com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: ClassicTokenizer

2018-01-10 Thread Rick Leir

Shawn
I did not express that clearly. 
The reference guide says "The Classic Tokenizer preserves the same behavior as 
the Standard Tokenizer of Solr versions 3.1 and previous. "

So I am curious to know why they changed StandardTokenizer after 3.1 to break 
on hyphens, when it seems to me to work better the old way?
Thanks
Rick

On January 9, 2018 7:07:59 PM EST, Shawn Heisey <apa...@elyograg.org> wrote:
>On 1/9/2018 9:36 AM, Rick Leir wrote:
>> A while ago the default was changed to StandardTokenizer from
>ClassicTokenizer. The biggest difference seems to be that Classic does
>not break on hyphens. There is also a different character pr(mumble). I
>prefer the Classic's non-break on hyphens.
>
>To have any ability to research changes, we're going to need to know
>precisely what you mean by "default" in that statement.
>
>Are you talking about the example schemas, or some kind of inherent
>default when an analysis chain is not specified?
>
>Probably the reason for the change is an attempt to move into the
>modern
>era, become more standardized, and stop using old/legacy
>implementations.  The name of the new default contains the word
>"Standard" which would fit in with that goal.
>
>I can't locate any changes in the last couple of years that change the
>classic tokenizer to standard.  Maybe I just don't know the right place
>to look.
>
>> What was the reason for changing this default? If I understand this
>better I can avoid some pitfalls, perhaps.
>
>If you are talking about example schemas, then the following may apply:
>
>Because you understand how analysis components work well enough to even
>ask your question, I think you're probably the kind of admin who is
>going to thoroughly customize the schema and not rely on the defaults
>for TextField types that come with Solr.  You're free to continue using
>the classic tokenizer in your schema if that meets your needs better
>than whatever changes are made to the examples by the devs.  The
>examples are only starting points, virtually all Solr installs require
>customizing the schema.
>
>Thanks,
>Shawn

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

ClassicTokenizer

2018-01-09 Thread Rick Leir

Hi all
A while ago the default was changed to StandardTokenizer from ClassicTokenizer. 
The biggest difference seems to be that Classic does not break on hyphens. 
There is also a different character pr(mumble). I prefer the Classic's 
non-break on hyphens. 

What was the reason for changing this default? If I understand this better I 
can avoid some pitfalls, perhaps.
Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Deliver static html content via solr

2018-01-05 Thread Rick Leir

Erik, Sorry I didn't mean to say Velocity has a security problem. I am just 
thinking that people will see it in action and think it is a full answer to a 
front end web app, though it has no input filtering or range checking ( as an 
output template system, natcch). 
What do you recommend for a very basic input filter in front of Solr with 
Velocity?
Thanks
Rick

On January 5, 2018 10:11:31 AM EST, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>Rick - fair enough, indeed.
>
>However, for a “static” resource, no Velocity syntax or learning curve
>needed.   In fact, correcting myself, VelocityResponseWriter isn’t even
>part of the picture for serving a static resource. 
>
>Have a look at example/files -
>https://github.com/apache/lucene-solr/tree/master/solr/example/files
>
>The  of each page (from head.vm) pulls a “static” resource like
>this:
>
>src="#{url_for_solr}/admin/file?file=/velocity/js/jquery.tx3-tag-cloud.js&contentType=text/javascript">
>
>The /admin/file handler will serve the bytes of any resource in config.
> 
>
>As for separate front-end app - always recommended by me, to be sure
>for real(!) applications, but for internal, one-off, quick and dirty,
>prototyping, showing off, or handy utility kinda things I’m not opposed
>to doing the Simplest Possible Thing That Works.As for security -
>VelocityResponseWriter doesn’t itself add any additional security
>concerns to Solr - it just transforms the Solr response into some
>textual (often HTML) format, instead of JSON or XML - so it itself
>isn’t a security concern.   What you need to do for Solr proper for
>security is a different story, but that is irrelevant to whether
>wt=velocity is in the mix.   It can actually be handy to use
>wt=velocity from inside a real app - it has been used it for generating
>e-mails in production systems and simply returning something formatted
>textually the way you want without an app template tier having to do
>so.   And Velocity, true to name, ain’t slow.
>
>For more on /browse, VrW, and example/files usage of those, check out
>https://lucidworks.com/2015/12/08/browse-new-improved-solr-5/
>
>   Erik
>
>
>
>> On Jan 5, 2018, at 4:19 AM, Rick Leir <rl...@leirtech.com> wrote:
>> 
>> Using Velocity, you can have some results-driven HTML served by Solr
>and all your JS, CSS etc 'assets' served by Apache from /var/www/html.
>Warning: the Velocity learning curve is steep and you still need a
>separate front-end web app for security because Velocity is a
>templating output filter. Eric, please correct me!
>> 
>> cheers -- Rick
>> 
>> 
>> On 01/04/2018 11:45 AM, Erik Hatcher wrote:
>>> All judgements aside on whether this is a preferred way to go, have
>a look at /browse and the VelocityResponseWriter (wt=velocity).  It can
>serve static resources.
>>> 
>>> I’ve built several prototypes this way that have been effective and
>business generating.
>>> 
>>>Erik
>>> 
>>>> On Jan 4, 2018, at 11:19, Matthias Geiger <matzschman...@gmail.com>
>wrote:
>>>> 
>>>> Hello,
>>>> i have a web application that delivers static html content to the
>user.
>>>> 
>>>> I have been thinking about the possibility to deliver this content
>from
>>>> solr instead of delivering it from the filesystem.
>>>> This would prevent the "double" stored content (html files on file
>>>> systems + additional solr cores)
>>>> 
>>>> Is this a viable approach or a no go?
>>>> In case of a no go why do you think it is wrong
>>>> 
>>>> In case of the suggestion of a nosql database, what makes noSql
>superior to
>>>> solr?
>>>> 
>>>> Regards and Thanks for your time
>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Deliver static html content via solr

2018-01-05 Thread Rick Leir

Using Velocity, you can have some results-driven HTML served by Solr and 
all your JS, CSS etc 'assets' served by Apache from /var/www/html. 
Warning: the Velocity learning curve is steep and you still need a 
separate front-end web app for security because Velocity is a templating 
output filter. Eric, please correct me!


cheers -- Rick


On 01/04/2018 11:45 AM, Erik Hatcher wrote:

All judgements aside on whether this is a preferred way to go, have a look at 
/browse and the VelocityResponseWriter (wt=velocity).  It can serve static 
resources.

I’ve built several prototypes this way that have been effective and business 
generating.

Erik


On Jan 4, 2018, at 11:19, Matthias Geiger  wrote:

Hello,
i have a web application that delivers static html content to the user.

I have been thinking about the possibility to deliver this content from
solr instead of delivering it from the filesystem.
This would prevent the "double" stored content (html files on file
systems + additional solr cores)

Is this a viable approach or a no go?
In case of a no go why do you think it is wrong

In case of the suggestion of a nosql database, what makes noSql superior to
solr?

Regards and Thanks for your time

Re: SOLR SSL Java command line properties

2018-01-05 Thread Rick Leir


Bob

Thanks for mentioning the jetty-ssl.xml file.

I have a follow-on question: since it is strongly recommended that you 
host Solr behind a web app (perhaps solr-security-proxy is adequate), 
the Solr REST interface will not be on the open Internet, so perhaps 
HTTP is the appropriate protocol?


Unless you have Solr authentication and do not trust all the internal 
hosts. I could be quite wrong, please correct.


cheers -- Rick


On 01/04/2018 11:51 AM, Bob Feider wrote:
When I use the provided Apache SOLR startup script (version 6.6.0), 
the script creates and then executes a java command line that has two 
sets of SSL properties who's related elements are set to the same 
values. One set has property names like |javax.net.ssl.*| while the 
other set has names like |solr.jetty.*|. For example:


   |java -server ...-Dsolr.jetty.keystore.password=secret
   ...-Djavax.net.ssl.keyStorePassword=secret ..-jar start.jar
   --module=https|

Our security team does not allow passwords to be passed along on the 
command line or in environment variables but will allow them to be 
placed in a file provided the file has restricted access permissions. 
I noticed that there is a |jetty-ssl.xml| file in the 
|solr/server/etc| directory that can be used to provide default values 
for the |SOLR SSL| related properties including the 
|solr.jetty.keystore.password|. When I remove the 
|javax.net.ssl.keyStorePassword| and |solr.jetty.keystore.password| 
properties from the java command line and update the |jetty-ssl.xml| 
file with my default keystore password, SOLR appears to start properly 
with the default keystore password contained in that file. I can then 
connect with my browser to |https://localhost:8983/solr/#| and access 
the SOLR Admin page just fine.


Are the |javax.net.ssl.*| properties used at all in the SOLR 
standalone or SOLR cloud products?


Do I need to provide the javax.net.ssl.* properties on the command 
line for proper operation or can I get away with simply providing them 
in the jetty-ssl.xml file?


I am concerned that they are used behind the scenes outside of the 
browser to SOLR server connections to connect to other processes like 
zookeeper and that by doing this I will uncover some problem down the 
road that my simple testing has not revealed. The only direct 
reference to the properties I can see in the source code is in the 
solr embedded code that is part of the solrj client inside the 
SSLConfig Java class.


Thanks for your help,

Bob

Re: Small Tokenization issue

2018-01-05 Thread Rick Leir


Nawab

Look at classicTokenizer. It is a good choice if you have part numbers 
with hyphens. The second tokenizer on this page: 
https://lucene.apache.org/solr/guide/6_6/tokenizers.html


Cheers -- Rick


On 01/03/2018 04:52 PM, Shawn Heisey wrote:

On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote:

Thanks Emir, Erick.

What i want to do is remove empty tokens after 
WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate 
empty

tokens?


I use LengthFilterFactory with a minimum of 1 and a maximum of 512 to 
remove empty tokens.


Thanks,
Shawn

Re: DIH XPathEntityProcessor XPath subset?

2018-01-05 Thread Rick Leir


Stefan

There is at least one free Solr WP plugin. There are several Solr PHP 
toolkits on github. Start with these unless your WP is wildly custo..  ..


cheers -- Rick


On 01/03/2018 11:50 AM, Erik Hatcher wrote:

Stefan -

If you pre-transform the XML, I’d personally recommend either transforming it 
into straight up Solr XML (docs/fields/values) or some other format or posting 
directly to Solr.   Avoid this DIH thing when things get complicated.

Erik


On Jan 3, 2018, at 11:40 AM, Stefan Moises  wrote:

Hi there,

I'm trying to index a wordpress site using DIH XPathEntityProcessor... I've 
read it only supports a subset of XPath, but I couldn't find any docs what 
exactly is supported.

After some painful trial and error, I've found that xpath expressions like the 
following don't work:

 

I want to find elements like this ("the 'value' element after a 'member' element 
with a name element 'post_title'"):


   
 
   
 
 
 
 
post_id11809
post_titleSome 
titel

Unfortunately that is the default output structure of Wordpress' XMLrpc calls.

My Xpath expression works e.g. when testing it with 
https://www.freeformatter.com/xpath-tester.html but not if I try to index it 
with Solr any ideas? Or do I have to pre-transform the XML myself to match 
XPathEntityProcessors limited abilites?

Thanks in advance,

Stefan

--
--

Stefan Moises
Manager Research & Development
shoptimax GmbH
Ulmenstraße 52 H
90443 Nürnberg
Tel.: 0911/25566-0
Fax: 0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de

Geschäftsführung: Friedrich Schreieck
Ust.-IdNr.: DE 814340642
Amtsgericht Nürnberg HRB 21703

Re: SolrJ with Async Http Client

2018-01-02 Thread Rick Leir

Agrawal
There is good reading on the topic at
https://wiki.apache.org/solr/IntegratingSolr
Cheers -- Rick

On January 2, 2018 10:31:28 AM EST, RAUNAK AGRAWAL  
wrote:
>Hi Guys,
>
>I am trying to write fully async service where solr calls are also
>async.
>Just wondering did anyone tried calling solr in non-blocking mode or is
>there is a way to do it? I have come across one such project
> but wondering is there anything
>provided
>by solrj?
>
>Thanks

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr Issue

2018-01-02 Thread Rick Leir

Lewin
Is this not a job for a database like MySQL? Solr is a search engine, which can 
be used as a DB with some effort. Choose the right tool for the job . Cheers -- 
Rick

On January 2, 2018 4:35:47 PM EST, "Lewin Joy (TMNA)"  
wrote:
>** PROTECTED 関係者外秘
>Hi,
>
>I am using Solr 6.1 and am facing an issue with a complex scenario.
>Could you help figure out how this can be achieved in Solr?
>
>We have items:  A, B, C . There will be multiple record entries for
>each items.
>For our understanding, let’s say the fields for these records are:
>primary_key,item_name,status.
>
>I need to retrieve all records with status= ‘N’ and filter out items
>which has any of it’s record matching status: ‘Y’
>
>For record set below, the query should only return me records 1 and 2.
>Primary_key
>
>Item_Name
>
>status
>
>1
>
>A
>
>N
>
>2
>
>A
>
>N
>
>3
>
>B
>
>N
>
>4
>
>B
>
>Y
>
>5
>
>B
>
>N
>
>6
>
>C
>
>Y
>
>7
>
>C
>
>N
>
>
>
>Currently, I am using Streaming Query expressions to do complement()
>operation.
>But the number of records with status= ‘Y’ is too huge and causes
>performance Problems.
>And secondly, streaming query exports with Joins and complements can’t
>be used properly for bringing out paginated outputs.
>
>Is there anyway, we can group the results and do a query on the group
>to filter out such records?
>Or any other approach which could give my results paginated?
>
>Thanks,
>Lewin

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Enable default wildcard search

2017-12-29 Thread Rick Leir

Siarhei:
Will you be putting up your system at github? I would like to Solr-ize my 
dovecot.

Maybe you saw this already:
https://github.com/dovecot/core/blob/master/doc/solr-schema.xml

https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/solr-connection.c

https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-solr-plugin.h

https://github.com/bdraco/dovecot/blob/master/doc/wiki/Plugins.FTS.Solr.txt
Cheers -- Rick

On December 28, 2017 4:15:06 PM EST, Siarhei Chystsiakou  
wrote:
>Hi
>Does anyone have any idea how to fix this?
>
>2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou :
>
>> Hi everybody!
>> I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I
>have
>> the following  settings:
>>
>> schema.xml - https://pastebin.com/1XXWTs8V
>> solrconfig.xml - https://pastebin.com/5HSswCcv
>>
>> But under these settings, the search works only on the full
>coincidence,
>> for instance, if I search for Chris it doesn't find  Christmas. The
>client
>> does not support wildcard search. I would like to know how to turn on
>> wildcard search for all queries.
>>
>> I tried to do that by adding the following line to schema.xml
>>
>> maxGramSize="25"/>
>>
>> but when I added it, Solr 6.6.1 very often showed errors during the
>> indexing, which led to its full crash, even the web interface didn't
>> respond, only the full Solr restart helped. This problem emerged both
>on
>> Solr 6.6.1 and Solr 7.2
>>
>> Also, in case of this option, the search result was not what I
>expected.
>> For example, when I searched for the word domain, the words domes and
>> domain were also included. I suppose, that from the point of view of
>this
>> operation, the result is correct, but this is not what I need.
>>
>> That is why I would like to know, how to turn on the standard
>wildcard
>> search. As it is impossible on the client's side, I would like to
>manage it
>> from the Solr side.
>>
>> Thanks.
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: solrcloud through aws elb

2017-12-26 Thread Rick Leir

Per,
This is more of a question for the Drupal folks. But in passing, I would 
suggest that you show your config and what you saw in your logs. And my guess 
is firewall problems!
Cheers -- Rick

On December 26, 2017 3:37:39 AM EST, Per Qvindesland  wrote:
>Hi All
>
>I am trying to connect to a solrcloud through a elb from a Drupal 7
>install, the elb is a tcp elb which seems to work well, drupal says it
>can talk to the solr install through the elb but when I try to index
>nothing seems to happen.
>
>Does anyone have any ideas on how to resolve this? or have any other
>suggestions on how to achieve redundancy? for the moment I am
>connecting to the solrcloud to one of the solrcloud instances, but
>should that fail then I don’t think I would have any redundancy.
>
>Regards
>Per

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: start with techproducts example in docker

2017-12-19 Thread Rick Leir

Christine
I think this is a long-lived docker container , meaning that it does not all 
terminate after the command you showed. If so, you should be able to start a 
console or ssh session to it. Have a look at the solr.log. Better still, start 
the techproducts example within this ssh session, and see what errors occur. 
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to restrict the fields solr returns?

2017-12-19 Thread Rick Leir

The fl parameter is used for this.

On December 19, 2017 3:22:59 AM EST, Solrmails  wrote:
>Hey
>
>I'm using a custom "QParserPlugin" to restrict which documents are
>returned to the user.
>Now I'd like to restrict also the fields that are returned with a
>document. I couldn't find a good entry point to do such a restriction.
>Maybe I  could missuse a "QueryResponseWriter" plugin but that sounds
>like a bad idea.
>
>Any other ideas?

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to restart solr in docker?

2017-12-18 Thread Rick Leir

Christine
Have a look at the API 
Lucene.apache.org/solr/guide/6_6/config-api.html
(Choose whatever version of the doc that is appropriate)
Various parts of solrconfig can be overlayed.

Or do your debugging with a normally installed Solr, then dockerize.
Cheers -- Rick


On December 18, 2017 12:28:25 PM EST, "Buckler, Christine" 
 wrote:
>That makes sense. I am trying to add the “Suggest” plugin so I modified
>the solrconfig.xml file. Is there a better way to do what I am trying
>to do? I have not been able to add the plugin successfully. Do you have
>a resource page that shows how to add the config file under a volume? 
>
>On 12/16/17, 3:17 AM, "alexpusch"  wrote:
>
>  While I don't know what exact solr image you use I can tell you this:
>
>1. The command of your dockerfile probably starts solr. A Docker
>container
>will automatically shutdown if the process that was started by it's
>command
>is killed. Meaning you should never 'restart' a process in a container,
>but
>restart the container as a whole.
>2. You need to make sure your solrconfig.xml is under a docker volume
>of
>some kind. If it is not, your changes will not take effect since after
>the
>container restart the solrconfig.xml will revert to the version that is
>in
>the image.
>
>
>
>
>--
>   Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Synonyms

2017-12-17 Thread Rick Leir

Hi All
Doug Turnbull's latest blog  
http://opensourceconnections.com/blog/2017/11/21/solr-synonyms-mea-culpa/ at 
OpenSourceConnections is great, I learned lots. He mentions Wordnet  the 
lexical database for the English language. If you are using his suggested 
synonyms in Solr to 'tune' tf/idf then your synonym file will have synonyms for 
hypernyms (correct me if I got this wrong). Does it make any sense to import 
the hypernym records from Wordnet in bulk? 
Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to implement Incremental Indexing.

2017-12-11 Thread Rick Leir

Fiz
Here is a blog article that seems to cover your plans
https://www.toadworld.com/platforms/nosql/b/weblog/archive/2017/02/03/indexing-mongodb-data-in-apache-solr

Also look at github, there are several projects which could do it for you.
Cheers -- Rick

On December 11, 2017 5:19:43 PM EST, Fiz Newyorker  wrote:
>Hello Solr Group Team,
>
>I am working on Solr 6.5 and indexing data from MongoDB 3.2.5. I want
>to
>know the best practices to implement incremental indexing.
>
>Every 30 mins the Updated Data in Mongo DB needs to indexed on Solr.
>How to
>implement this. ? How would Solr know whenever there is an update on
>Mongodb ?  Indexing should run automatically. Should I setup any crone
>Jobs
>?
>
> Please let me know how to proceed further on the above requirement.
>
>Right now I am doing indexing manually.
>
>Thanks
>Fiz Ahmed

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: FW: Need Help Configuring Solr To Work With Nutch

2017-12-09 Thread Rick Leir

Ara
The config for soft commit would not be in schema.xml, please look in 
solrconfig.xml.

Look in solr.log for evidence of commits occurring. Explore the SolrAdmin 
console, what are the document counts?

You can post snippets from your config files here.
Cheers --Rick


On December 8, 2017 4:23:00 PM EST, "Mukhopadhyay, Aratrika" 
<aratrika.mukhopadh...@mail.house.gov> wrote:
>Rick , 
>Thanks for your reply. I do not see any errors or exceptions in the
>solr logs. I have read that the my schema in nutch needs to match the
>schema in solr. When I change the schema in in the config directory and
>restart solr my changes are lost. Leaving the schema alone is the only
>way I can get the indexing job to run but I cant query for the data in
>solr. Would you like me to send you specific configuration files ? I
>cant seem to get this to work. 
>
>Kind regards,
>Aratrika Mukhopadhyay
>
>-Original Message-
>From: Rick Leir [mailto:rl...@leirtech.com] 
>Sent: Friday, December 08, 2017 4:06 PM
>To: solr-user@lucene.apache.org
>Subject: Re: FW: Need Help Configuring Solr To Work With Nutch
>
>Ara
>Softcommit might be the default in Solrconfig.xml, and if not then you
>should probably make it so. Then you need to have a look in solr.log if
>things are not working as you expect. 
>Cheers -- Rick
>
>On December 8, 2017 3:23:35 PM EST, "Mukhopadhyay, Aratrika"
><aratrika.mukhopadh...@mail.house.gov> wrote:
>>Erick,
>>Do I need to set the softCommit = true and prepareCommit to true in my
>
>>solrconfig ? I am still at a loss as to what is happening. Thanks
>again 
>>for your help.
>>
>>Aratrika
>>
>>From: Mukhopadhyay, Aratrika
>>Sent: Friday, December 08, 2017 11:34 AM
>>To: solr-user <solr-user@lucene.apache.org>
>>Subject: RE: Need Help Configuring Solr To Work With Nutch
>>
>>
>>Hello Erick ,
>>
>>   This is what I see in the logs :
>>
>>[cid:image001.png@01D37018.62D3CC90]
>>
>>
>>
>>I am sorry it sbeen a while since I worked with solr. I did not do 
>>anything to specifically commit the changes to the core. Thanks for 
>>your prompt attention to this matter.
>>
>>
>>
>>Aratrika Mukhopadhyay
>>
>>
>>
>>-Original Message-
>>From: Erick Erickson [mailto:erickerick...@gmail.com]
>>Sent: Friday, December 08, 2017 11:06 AM
>>To: solr-user
>><solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
>>Subject: Re: Need Help Configuring Solr To Work With Nutch
>>
>>
>>
>>1> do you see update messages in the Solr logs?
>>
>>2> did you issue a commit?
>>
>>
>>
>>Best,
>>
>>Erick
>>
>>
>>
>>On Fri, Dec 8, 2017 at 7:27 AM, Mukhopadhyay, Aratrika < 
>>aratrika.mukhopadh...@mail.house.gov<mailto:Aratrika.Mukhopadhyay@mail.
>>house.gov>>
>>wrote:
>>
>>
>>
>>> Good Morning,
>>
>>>
>>
>>>I am running nutch 2.3 , hbase 0.98 and I am integrating
>>
>>> nutch with solr 6.4. I have a successful crawl in nutch and when I
>>see
>>
>>> that it is indexing the content into solr. However I cannot query
>and
>>get any results.
>>
>>> Its as if Nutch isn’t writing anything to solr at all. I am stuck
>and
>>
>>> need someone who is familiar with solr/nutch to provide assistance.
>>
>>> Can someone please help ?
>>
>>>
>>
>>>
>>
>>>
>>
>>> This is what I see when I index into solr. I see no errors.
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>> Regards,
>>
>>>
>>
>>> Aratrika Mukhopadhyay
>>
>>>
>
>--
>Sorry for being brief. Alternate email is rickleir at yahoo dot com 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: FW: Need Help Configuring Solr To Work With Nutch

2017-12-08 Thread Rick Leir

Ara
Softcommit might be the default in Solrconfig.xml, and if not then you should 
probably make it so. Then you need to have a look in solr.log if things are not 
working as you expect. 
Cheers -- Rick

On December 8, 2017 3:23:35 PM EST, "Mukhopadhyay, Aratrika" 
 wrote:
>Erick,
>Do I need to set the softCommit = true and prepareCommit to true in my
>solrconfig ? I am still at a loss as to what is happening. Thanks again
>for your help.
>
>Aratrika
>
>From: Mukhopadhyay, Aratrika
>Sent: Friday, December 08, 2017 11:34 AM
>To: solr-user 
>Subject: RE: Need Help Configuring Solr To Work With Nutch
>
>
>Hello Erick ,
>
>   This is what I see in the logs :
>
>[cid:image001.png@01D37018.62D3CC90]
>
>
>
>I am sorry it sbeen a while since I worked with solr. I did not do
>anything to specifically commit the changes to the core. Thanks for
>your prompt attention to this matter.
>
>
>
>Aratrika Mukhopadhyay
>
>
>
>-Original Message-
>From: Erick Erickson [mailto:erickerick...@gmail.com]
>Sent: Friday, December 08, 2017 11:06 AM
>To: solr-user
>>
>Subject: Re: Need Help Configuring Solr To Work With Nutch
>
>
>
>1> do you see update messages in the Solr logs?
>
>2> did you issue a commit?
>
>
>
>Best,
>
>Erick
>
>
>
>On Fri, Dec 8, 2017 at 7:27 AM, Mukhopadhyay, Aratrika <
>aratrika.mukhopadh...@mail.house.gov>
>wrote:
>
>
>
>> Good Morning,
>
>>
>
>>I am running nutch 2.3 , hbase 0.98 and I am integrating
>
>> nutch with solr 6.4. I have a successful crawl in nutch and when I
>see
>
>> that it is indexing the content into solr. However I cannot query and
>get any results.
>
>> Its as if Nutch isn’t writing anything to solr at all. I am stuck and
>
>> need someone who is familiar with solr/nutch to provide assistance.
>
>> Can someone please help ?
>
>>
>
>>
>
>>
>
>> This is what I see when I index into solr. I see no errors.
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>> Regards,
>
>>
>
>> Aratrika Mukhopadhyay
>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir

Matthew, Oops, I should have mentioned re-indexing. With Solr, you want to be 
able to re-index quickly so you can try out different analysis chains. XSLT may 
not be fast enough for this if you have millions of docs. So I would be 
inclined to save the docs to a normal filesystem, perhaps in JSONL. Then use 
DIH or post tool or Python to post the docs to Solr.
Rick

On December 7, 2017 10:14:37 AM EST, Rick Leir <rl...@leirtech.com> wrote:
>Matthew,
>Do you have some sort of script calling xslt? Sorry, I do not know
>Scala and I did not have time to look into your spark utils.  The
>script or Scala could then shell out to curl, or if it is python it
>could use the request library to send a doc to Solr. Extra points for
>batching the documents. 
>
>Erick
>The last time I used the post tool, it was spinning up a jvm each time
>I called it (natch). Is there a simple way to launch it from a Java app
>server so you can call it repeatedly without the start-up overhead? It
>has been a few years, maybe I am wrong.
>Cheers -- Rick
>
>On December 6, 2017 5:36:51 PM EST, Erick Erickson
><erickerick...@gmail.com> wrote:
>>Perhaps the bin/post tool? See:
>>https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/
>>
>>On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth <mgrot...@gmail.com>
>>wrote:
>>> Hi All,
>>>
>>> Is there a DIH for HDFS? I see this old feature request [0
>>> <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems
>>to have
>>> gone anywhere. Google searches and searches on this list don't get
>me
>>to
>>> far.
>>>
>>> Essentially my workflow is that I have many thousands of XML
>>documents
>>> stored in hdfs. I run an xslt transformation in spark [1
>>> <https://github.com/elsevierlabs-os/spark-xml-utils>]. This
>>transforms to
>>> the expected solr input of . This
>>is
>>> than written the back to hdfs. Now how do I get it back to solr? I
>>suppose
>>> I could move the data back to the local fs, but on the surface that
>>feels
>>> like the wrong way.
>>>
>>> I don't need to store the documents in HDFS after the spark
>>transformation,
>>> I wonder if I can write them using solrj. However, I am not really
>>familiar
>>> with solrj. I am also running a single node. Most of the material I
>>have
>>> read on spark-solr expects you to be running SolrCloud.
>>>
>>> Best,
>>> Matt
>>>
>>>
>>>
>>> [0] https://issues.apache.org/jira/browse/SOLR-2096
>>> [1] https://github.com/elsevierlabs-os/spark-xml-utils
>
>-- 
>Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir

Matthew,
Do you have some sort of script calling xslt? Sorry, I do not know Scala and I 
did not have time to look into your spark utils.  The script or Scala could 
then shell out to curl, or if it is python it could use the request library to 
send a doc to Solr. Extra points for batching the documents. 

Erick
The last time I used the post tool, it was spinning up a jvm each time I called 
it (natch). Is there a simple way to launch it from a Java app server so you 
can call it repeatedly without the start-up overhead? It has been a few years, 
maybe I am wrong.
Cheers -- Rick

On December 6, 2017 5:36:51 PM EST, Erick Erickson  
wrote:
>Perhaps the bin/post tool? See:
>https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/
>
>On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth 
>wrote:
>> Hi All,
>>
>> Is there a DIH for HDFS? I see this old feature request [0
>> ] that never seems
>to have
>> gone anywhere. Google searches and searches on this list don't get me
>to
>> far.
>>
>> Essentially my workflow is that I have many thousands of XML
>documents
>> stored in hdfs. I run an xslt transformation in spark [1
>> ]. This
>transforms to
>> the expected solr input of . This
>is
>> than written the back to hdfs. Now how do I get it back to solr? I
>suppose
>> I could move the data back to the local fs, but on the surface that
>feels
>> like the wrong way.
>>
>> I don't need to store the documents in HDFS after the spark
>transformation,
>> I wonder if I can write them using solrj. However, I am not really
>familiar
>> with solrj. I am also running a single node. Most of the material I
>have
>> read on spark-solr expects you to be running SolrCloud.
>>
>> Best,
>> Matt
>>
>>
>>
>> [0] https://issues.apache.org/jira/browse/SOLR-2096
>> [1] https://github.com/elsevierlabs-os/spark-xml-utils

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Howto search for § character

2017-12-06 Thread Rick Leir

Bernd,
What is the analysis chain you have in schema.xml? The chain tokenizes text and 
filters characters. There is an index time chain and a query time chain. My 
suspicion is that your analysis chain is mapping that char to a plain ascii 
char. Use the SolrAdmin analysis tab to debug this.
Cheers -- Rick

On December 6, 2017 11:09:14 AM EST, Bernd Schmidt  
wrote:
>
>Hi all,
>
>
>we have defined a field named "_text_" for a full text search based on
>field-type "text_general":
>indexed="true" stored="false"/>"
>
>
>When trying to search for the "§" character, we have strange behaviour:
>
>
>q=_text_:§ AND entityClass:StructureNodeImpl  => numFound:469 (all
>nodes where entityClass:StructureNodeImpl)
>q=_text_:§ => numFound:0
>
>
>How can we search for the occurence of the § character?
>
>
>Best regards, 
>    Bernd
>
> Mit freundlichen Grüßen
>
> Bernd Schmidt
> SOFTWARE-ENTWICKLUNG 
>
> b.schm...@eggheads.de
>
>
>
> eggheads GmbH
> Herner Straße 370
>44807 Bochum
>
>Fon +49 234 89397-0
>Fax +49 234 89397-28
> 
> www.eggheads.de
> ---
>
>
>Kunden DER TOURISTIK, EMSA, FRIATEC, MAMMUT, SUTTERLÜTY, SCHÄFER SHOP,
>THOMAS COOK, TUI, WILO SE, WÜRTH, u.v.m.
>
>
>Leistungen Standardsoftware für Product Information Management, Cross
>Media Publishing & Multi Channel Commerce, Prozessberatung
>
>
>Innovationspreis 2017 eggheads ist Sieger beim Innovationspreis-IT 2017
>in der Kategorie E-Commerce. Mehr
>
>---
>
>Webinar Vorstellung der neuen Funktionalität der eggheads Suite am
>12.12.2017. Mehr
>
>---

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Fwd: solr-security-proxy

2017-12-01 Thread Rick Leir

The default blacklist is qt and stream, because there are examples of nasty 
things which can be done using those parms. But it seems much wiser to 
whitelist just the parms your web app needs to use. Am I missing something? Is 
there a simpler way to protect a Solr installation which just serves a few AJAX 
GETs? Cheers -- Rick

On November 30, 2017 3:10:14 PM EST, Rick Leir <rl...@leirtech.com> wrote:
>Hi all
>I have just been looking at solr-security-proxy, which seems to be a
>great little app to put in front of Solr (link below). But would it
>make more sense to use a whitelist of Solr parameters instead of a
>blacklist?
>Thanks
>Rick
>
>https://github.com/dergachev/solr-security-proxy
>
>solr-security-proxy
>Node.js based reverse proxy to make a solr instance read-only,
>rejecting requests that have the potential to modify the solr index.
>--invalidParams   Block these query params (comma separated)  [default:
>"qt,stream"]
>
>
>-- 
>Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Fwd: solr-security-proxy

2017-11-30 Thread Rick Leir

Hi all
I have just been looking at solr-security-proxy, which seems to be a great 
little app to put in front of Solr (link below). But would it make more sense 
to use a whitelist of Solr parameters instead of a blacklist?
Thanks
Rick

https://github.com/dergachev/solr-security-proxy

solr-security-proxy
Node.js based reverse proxy to make a solr instance read-only, rejecting 
requests that have the potential to modify the solr index.
--invalidParams   Block these query params (comma separated)  [default: 
"qt,stream"]


-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr Wildcard Search

2017-11-30 Thread Rick Leir

George,
When you get those results it could be due to stemming.

Wildcard processing expands your term to multiple terms, OR'd together. It also 
takes you down a different analysis pathway, as many analysis components do not 
work with multiple terms. Look into the SolrAdmin console, and use the analysis 
tab to understand what is going on.

If you still have doubts, tell us more about your config.
Cheers --Rick

On November 30, 2017 7:06:42 AM EST, Georgy Nevsky 
 wrote:
>Can somebody help me understand how Solr Wildcard Search is working?
>
>If I’m doing search for “ship*” term I’m getting in result many
>strings,
>like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc.
>
>But if I’m searching for “shipp*” I don’t get any result.
>
>
>
>In the best we trust
>
>Georgy Nevsky

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-27 Thread Rick Leir

Leo
Your low priority data could be accumulated in a Couchbase DB or just in JSONL. 
Then it would be easy to re-index.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Rick Leir

Edwin
There is a spec for which characters are acceptable in an email name, and 
another spec for chars in a domain name. I suspect you will have more success 
with a tokenizer which is specialized for email, but I have not looked at 
UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory split on hyphens? 
Cheers --Rick

On November 24, 2017 3:46:46 AM EST, Zheng Lin Edwin Yeo  
wrote:
>Hi,
>
>I am indexing email addresses into Solr via EML files. Currently, I am
>using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I
>also
>found that we can also use UAX29URLEmailTokenizerFactory with
>LowerCaseFilterFactory.
>
>Does anyone have any recommendation on which Tokenizer is better?
>
>I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
>
>Regards,
>Edwin

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr7: Very High number of threads on aggregator node

2017-11-23 Thread Rick Leir

Nawab
What do you see in the log file?

If nothing else is solving the problem, then get a sample V7 solrcinfig.xml and 
use it, modified to suit your needs.
Cheers -- Rick

On November 22, 2017 11:38:13 AM EST, Nawab Zada Asad Iqbal <khi...@gmail.com> 
wrote:
>Rick
>
>Your suspicion is correct. I mostly reused my config from solr4 except
>where it was deprecated or obsoleted and I switched to the newer
>configs:
>Having said that I couldn't find any new query related settings which
>can
>impact us, since most of our queries dont use fancy new features.
>
>I couldn't find a decent way to copy long xml here, so i created this
>stackoverflow thread:-
>
>https://stackoverflow.com/questions/47439503/solr-7-0-1-aggregator-node-spinning-many-threads
>
>
>Thanks!
>Nawab
>
>
>On Mon, Nov 20, 2017 at 3:10 PM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Nawab
>> Why it would be good to share the solrconfigs: I had a suspicion that
>you
>> might be using the same solrconfig for version 7 and 4.5. That is
>unlikely
>> to work well. But I could be way off base.
>> Rick
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Issue facing with spell text field containing hyphen

2017-11-21 Thread Rick Leir

Chirag
Look in Sor Admin, the Analysis panel. Put spider-man in the left and right 
text inputs, and see how it gets analysed. Cheers -- Rick

On November 20, 2017 10:00:49 PM EST, Chirag garg  wrote:
>Hi Rick,
>
>Actually my spell field also contains text with hyphen i.e. it contains
>"spider-man" even then also i am not able to search it.
>
>Regards,
>Chirag
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Issue facing with spell text field containing hyphen

2017-11-20 Thread Rick Leir

Chirag
Some scattered clues:
StandardTokenizer splits on punctuation, so your spell field might not contain 
spider-man.

When you do a wildcard search, the analysis chain can be different from what 
you expected.
Cheers -- Rick

On November 20, 2017 9:58:54 AM EST, Chirag Garg  wrote:
>Hi Team,
>
>I am facing issue for string containing hyphen when searched in spell
>field.
>My solr core is solr-6.6.0
>
>Points to reproduce:-
>Eg:- 1. My search string is "spider-man".
>2. When I do a search in solr with query spell:*spider-*. It shows
>numDocs=0 even though content is present.
>3 . But working fine when searched spell:*spider*.
>
>My config for solr in schema.xml is:-
>
>positionIncrementGap="100">
>  
>mapping="mapping-ISOLatin1Accent.txt"/>
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>/>
>protected="protwords.txt"
>generateWordParts="1"
>generateNumberParts="1"
>catenateWords="1"
>catenateNumbers="1"
>catenateAll="0"
>splitOnCaseChange="0"
>preserveOriginal="1"/>
>
>
>protected="protwords.txt"/>
>
>  
>  
>mapping="mapping-ISOLatin1Accent.txt"/>
>
>ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>/>
>protected="protwords.txt"
>generateWordParts="1"
>generateNumberParts="1"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>splitOnCaseChange="0"
>preserveOriginal="1"/>
>
>
>protected="protwords.txt"/>
>
>  
>   
>mapping="mapping-ISOLatin1Accent.txt"/>
>
>ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>/>
>protected="protwords.txt"
>generateWordParts="1"
>generateNumberParts="1"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>splitOnCaseChange="1"
>preserveOriginal="1"/>
>
>
>protected="protwords.txt"/>
>
>  
>
>
>positionIncrementGap="100">
>  
>
>words="stopwords.txt"/>
>
>
>
>  
>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Rick Leir

Nawab
Why it would be good to share the solrconfigs: I had a suspicion that you might 
be using the same solrconfig for version 7 and 4.5. That is unlikely to work 
well. But I could be way off base. 
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Fwd: CVE-2017-3163 - SOLR-5.2.1 version

2017-11-20 Thread Rick Leir

Pad
Read the CVE. Do you have an affected version of Solr? Do you have the 
replication feature enabled in solrconfig.xml? Note that it might be enabled by 
default. Test directory traversal on your system: can you read files remotely? 
No? Then you are finished.

A better plan: upgrade to a newer version of Solr (I know, you may not be able 
to).
Cheers -- Rick

On November 20, 2017 4:01:47 AM EST, padmanabhan gonesani 
 wrote:
>Please help me here
>
>
>
>-- Forwarded message --
>From: padmanabhan gonesani 
>Date: Mon, Nov 13, 2017 at 5:12 PM
>Subject: CVE-2017-3163 - SOLR-5.2.1 version
>To: gene...@lucene.apache.org
>
>
>
>Hi Team,
>
>*Description:* Apache Solr could allow a remote attacker to traverse
>directories on the system, caused by a flaw in the Index Replication
>feature. An attacker could send a specially-crafted request to read
>arbitrary files on the system (CVE-ID: CVE-2017-3163)
>
>Security vulnerability link: https://cve.mitre.org/cgi-bin/
>cvename.cgi?name=CVE-2017-3163
>
>*Apache SOLR implementation:*
>
>We are using Apache Solr-5.2.1 and replication factor=1 for index
>creation.
>We are using basic common SOLR features and it doesn't have the
>following
>features
>
>1. Index Replication
>2. Master / slave mechanism
>
>*Considering the above not implemented features will this "CVE-ID:
>CVE-2017-3163" security vulnerability have any impact?*
>
>Any help is appreciated here.
>
>
>Best Regards,
>Paddy G
>+91-8148593020 <+91%2081485%2093020>
>
>
>
>-- 
>
>
>Best Regards,
>Paddy G
>+91-8148593020

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr7: Very High number of threads on aggregator node

2017-11-18 Thread Rick Leir

Nawab
You probably need to share the relevant config to get an answer to this.
Cheers -- Rick

On November 17, 2017 2:19:03 PM EST, Nawab Zada Asad Iqbal  
wrote:
>Hi,
>
>I have a sharded solr7 cluster and I am using an aggregator node (which
>has
>no data/index of its own) to distribute queries and aggregate results
>from
>the shards. I am puzzled that when I use solr7 on the aggregator node,
>then
>number of threads shoots up to 32000 on that host and then the process
>reaches its memory limits. However, when i use solr4 on the aggregator,
>then it all seems to work fine. The peak number of threads during my
>testing were around 4000 or so. The test load is same in both cases,
>except
>that it doesn't finish in case of solr7 (due to the memory / thread
>issue).
>The memory settings and Jetty  threadpool setting (max=1) are also
>consistent in both servers (solr 4 and solr 7).
>
>
>Has anyone else been in similar circumstances?
>
>
>Thanks
>Nawab

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: DIH not stop

2017-11-16 Thread Rick Leir

Can,
I would like to learn many languages, but so far only two.

Shawn suggested you get help from a friend who knows English. As well, Google 
translate is great for me, but I have not used it with Turkish.
Cheers -- Rick

On November 16, 2017 5:19:33 AM EST, Shawn Heisey  wrote:
>On 11/15/2017 11:59 PM, Can Ezgi Aydemir wrote:
>> I configured Solr and Cassandra. Running full data import but not
>stop. Only core load during this process, stop it. Seeing that stop
>dih, not write dataimport.properties.
>> 
>> In dataconfig.xml file, i define simplepropertywriter type and
>filename. But not write it in dataimport.properties file.
>> 
>> How can i solve this problem?
>
>I wasn't sure if I should send this message, because it might be 
>interpreted as rude.  In the end, I decided to proceed.  Being rude is 
>not my intention, I would like to help.
>
>I am finding it difficult to understand your problem description.  Can 
>you try to describe your problem more completely?
>
>I see that you have shared a log where there are no error messages. 
>When there are no error messages, it's difficult to diagnose the 
>problem, so a detailed problem description is even more important.
>
>In particular, it might be helpful to get a full transcript from the
>DIH 
>"status" command at the point where you believe DIH has encountered a 
>problem.  The status command displays quite a lot of information about 
>the import that's underway and may reveal something that the log
>doesn't.
>
>Your email address says you're in the country that I know as Turkey, so
>
>perhaps you're finding it difficult to describe the problem in English.
>
>Since English is the language this list is typically conducted in, if 
>you have access to somebody who writes the language well and also knows
>
>your native language, it might be a good idea to ask them for some
>help. 
> You can also write your message in your native language and hope that 
>somebody can understand it or get it translated, but that's a little
>bit 
>less likely to get a response.
>
>Thanks,
>Shawn

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: TimeZone issue

2017-11-16 Thread Rick Leir

Renuka
Are your clients all in the same time zone? Solr should support clients in 
several timezones, and UTC conversion to local is best done in the client in my 
mind. Thanks -- Rick

On November 16, 2017 6:54:47 AM EST, Renuka Srishti 
 wrote:
>Thanks for your response Shawn. I know it deals with UTC only, but it
>will
>be great if we can change the date timeZone in solr response. As I am
>using
>Solr CSV feature and it will be helpful if the date field in the CSV
>result
>can convert into client TimeZone. Please suggest if you have any
>alternate
>for this.
>
>Thanks
>Renuka Srishti
>
>On Wed, Nov 15, 2017 at 6:16 PM, Shawn Heisey 
>wrote:
>
>> On 11/15/2017 5:34 AM, Renuka Srishti wrote:
>>
>>> I am working on CSV export using Apache Solr. I have written all the
>>> required query and set wt as CSV. I am getting my results as I
>want,but
>>> the
>>> problem is TimeZone.
>>>
>>> Solr stores date value in UTC, but my client timeZone is different.
>Is
>>> there any way to convert date timeZone from UTC to clientTimeZone
>direclty
>>> in the Solr response?
>>>
>>
>> Not that I know of.  UTC is the only storage/transfer method that
>works in
>> all situations.  Converting dates to the local timezone is a task for
>the
>> client, when it displays the date to a user.
>>
>> Typically, you would consume the response from Solr into object types
>for
>> the language your application is written in.  A date value in the
>response
>> should end up in a date object.  Date objects in most programming
>languages
>> have the ability to display in specific timezones.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How to get a solr core to persist

2017-11-15 Thread Rick Leir

Hi Shawn, Amanda
When we put the data under /var/lib, I feel a need to put the config under 
/etc. Is this recommended, and would you use a symbolic link for the conf dir?
Cheers--Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Admin Console Question

2017-11-14 Thread Rick Leir

Homer
In chrome, right-click and choose 'inspect' at the bottom. Now go to the 
network tab then reload the page. Are you seeing errors? Tell!
Thanks
Rick

On November 14, 2017 3:14:46 PM EST, Shawn Heisey  wrote:
>On 11/14/2017 11:43 AM, Webster Homer wrote:
>> I am using chrome Version 62.0.3202.94 (Official Build) (64-bit) I
>> only see a little icon and the word "Args" with nothing displayed. I
>> just checked with Firefox (version 56.0.2) and I see the same thing.
>
>That's the same version of Chrome that I used, and I do see the args.
>
>Later in the thread you mentioned that Solr is installed as a service. 
>I installed Solr 7.1.0 as a service on a Linux machine, and that
>version
>is also working.
>
>https://www.dropbox.com/s/w3htgf83mmdvgx4/solr71-service-admin-ui-java-args.png?dl=0
>
>Can you create a screenshot similar to the one I have shared via
>dropbox, so we can see everything you do?  You'll likely need to use a
>file sharing site -- attachments frequently are stripped by the mailing
>list software.
>
>Thanks,
>Shawn

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: recent utf8 problems

2017-11-07 Thread Rick Leir

Dr Krell
Item 11): It is best to get the solrconfig.xml provided with the new version of 
Solr, and change it to suit your needs. Do not try to work from the old 
version's solrconfig.xml.

I did not have time to read the other items. 

Look in solr.log, and compare the successful query with the unsuccessful one 
for clues, then look at the config for /select again.
Cheers -- Rick

On November 7, 2017 12:43:00 AM EST, "Dr. Mario Michael Krell" 
 wrote:
>Hi,
>
>thank you for your time and trying to narrow down my problem.
>
>1) When looking for Tübingen in the title, I am expecting the 3092484
>results. That sounds like a reasonable result. Furthermore, when
>looking at some of the results, they are exactly what I am looking for.
>
>2) I am testing them against the same solr server. This is a very
>simple testing setup, that brings our problem to the core. Originally,
>we used a urlib.request.urlopen query to get the data in Python and
>then send it to our webpage (http://search.mmcommons.org/) as a json
>object. I think, I should explain my test more clearly. We use a
>webbrowser (Firefox or Chrome) to open the admin console of the search
>engine, which is at http://localhost:8983/solr/#/mmc_search3/query
> on my local device.
>This is the default behavior. In this webbrowser, I use the query 
>"title:T%C3%BCbingen” in the field “g” with /select as the
>“Request-Handler (qt) <>”.This approach works like a charm (result wich
>echoParams attached). Also as asked by Rick, the request url displayed
>in the upper left is just perfect:
>http://localhost:8983/solr/mmc_search3/select?echoParams=all=title:T%C3%BCbingen=python
>
>The problems start to occur, when I click on this url:
>{
>  'responseHeader':{
>'status':0,
>'QTime':0,
>'params':{
>  'q':u'title:T\u00fcbingen',
>  'echoParams':'all',
>  'wt':'python'}},
>  'response':{'numFound':0,'start':0,'docs':[]
>  }}
>So it seems internally, Solr is changing the request (or a used
>library?). I just don’t have any idea why. But I would like to get the
>more than 3 million results. I could as well just enter the above url
>into my browser and the url will be changed to
>http://localhost:8983/solr/mmc_search3/select?echoParams=all=title:Tübingen=python
>
>and I get the same result (no found documents). So this is the problem.
>However, when I copy paste the url, it is still displaying the utf8
>encoding. I thing the “ü” in the url is just an improved layout by the
>browser.
>
>The confusion with the different solr comes from the fact, that I am
>continuously trying to improve my search index and make it more
>efficient. Hence I reindexed it several times, always to the latest
>version. The last reindexing occurred for Solr 7.0.1. having the
>indexing for Lucene 7.0.1. However, I performed the test also for other
>versions without any success.
>
>3) As Rick said: "With the Yahoo Flickr Creative Commons 100 Million
>(YFCC100m) dataset, a great novel dataset was introduced to the
>computer vision and multimedia research community." — cool
>
>My objective it to make it better usable, especially by providing
>different search modalities. The dataset consists of 99 Million images
>and 800k videos, but I am only working on the Flickr as well as
>generated metadata and try to add more and more metadata. The next big
>challenge is similarity search.
>
>4)
>http://localhost:8983/solr/mmc_search3/select?echoParams=all=title:Tübingen=python
>
>is displayed but it is
>http://localhost:8983/solr/mmc_search3/select?echoParams=all=title:T%C3%BCbingen=python
>.
>
>5) I am searching for Tübingen. It is u-umlaut (LATIN SMALL LETTER U
>WITH DIAERESIS) as Rick said.
>
>6) I am just clicking on it in the admin solr standard interface. I
>could as well copy it into my webbrowser and open it. The result would
>be the same.
> 
>
>7) As you can see in the result, the document seems to be indexed
>correctly, isn’t it? If we can’t figure anything out, I will try to
>reindex again but this will take a while because of the large amount of
>data and my limited compute power.
>
>8) Thanks for the hint with echoparams. The result is displayed above.
>
>9) As shown in the attached search result, there are actually results
>correctly indexed.
>
>10) The above example is now with Python.
>
>11) @Rick: Shall I change the /select handler? I do not quite
>understand the problem with it. But maybe as an explanation, my
>original config was probably based on solr4.x. I basically just updated
>the Lucene version and I had to

Re: recent utf8 problems

2017-11-06 Thread Rick Leir

Hoss
Clearly it is 
U+00FC  ü   c3 bc   LATIN SMALL LETTER U WITH DIAERESIS
As in Tübingen

"With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a great 
novel dataset was introduced to the computer vision and multimedia research 
community." -- cool

I think it is strange that the /select handler was completely default. In my 
experience there is some sort of config for it in solrconfig.xml.

In the SolrAdmin console, query pane, when you have entered some params and 
done a search it shows you the complete URL above the results. Does that match 
your select query?

We could ask ''what changed previous to it being broken". I suspect there was 
something other than the Java upgrade, but it will be interesting if that is 
actually the cause.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: recent utf8 problems

2017-11-06 Thread Rick Leir

Dr. Krell
You could look at your /select query handler, and compare it with the /query 
query handler in the Admin config. 

Did you upgrade from a previous version of Solr? Or change your config ( no, 
you must have thought of that). If it is a bug related to the Java upgrade then 
you need to show your config before folks can help.
Cheers -- Rick


On November 4, 2017 5:11:36 PM EDT, "Dr. Mario Michael Krell" 
 wrote:
>Hi,
>
>We recently discovered issues with solr with converting utf8 code in
>the search. One or two month ago everything was still working.
>
>- What might have caused it is a Java update (Java 8 Update 151). 
>- We are using firefox as well as chrome for displaying results.
>- We tested it with Solr 6.5, Solr 7.0.0, 7.0.1, and 7.1.
>
>We created a search engine base on the yfcc100m and in the normal
>browser (http://localhost:8983/solr/#/mmc_search3/query
>), we can search for
>"title:T%C3%BCbingen” in the query field and get more than 3 million
>results:
>
>{
>  "responseHeader":{
>"status":0,
>"QTime":103},
>  "response":{"numFound":3092484,"start":0,"docs":[
>  {
>"photoid":"6182384834",
>
>However, when we use the respective web-address, 
>http://localhost:8983/solr/mmc_search3/select?q=title:T%C3%BCbingen=json
>
>The results are deduced to zero:
>{
>  "responseHeader":{
>"status":0,
>"QTime":0},
>  "response":{"numFound":0,"start":0,"docs":[]
>  }}
>
>responseHeader 
>status 0
>QTime  0
>response   
>numFound   0
>start  0
>docs   []
>
>I would be happy for any suggestions on how to fix this problem. For me
>it seems like a bug in solr caused by Java.
>
>Best,
>
>Mario

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Fwd: configuring Solr with Tesseract

2017-11-06 Thread Rick Leir

Anand,
As Charlie says you should have a separate process for this. Also, if you go 
back about ten months in this mailing list you will see some discussion about 
how OCR can take minutes of CPU per page, and needs some preprocessing with 
Imagemagick or Graphicsmagick. You will want to do some fine tuning with this, 
then save your OCR output in a DB or the filesystem. Then you will want to be 
able to re-index Solr easily as you fine tune Solr. 

Yes, use Python or your preferred Scripting language.
Cheers -- Rick

On November 6, 2017 4:05:42 AM EST, Charlie Hull  wrote:
>On 03/11/2017 15:32, Admin eLawJournal wrote:
>> Hi,
>> I have read that we can use tesseract with solr to index image files.
>I
>> would like some guidance on setting this up.
>> 
>> Currently, I am using solr for searching my wordpress installation
>via the
>> WPSOLR plugin.
>> 
>> I have Solr 6.6 installed on ubuntu 14.04 which is working fine with
>> wordpress.
>> 
>> I have also installed tesseract but have no clue on configuring it.
>> 
>> 
>> I am new to solr so will greatly appreciate a detailed step by step
>> instruction.
>
>Hi,
>
>I'm guessing if you're using a preconfigured Solr plugin for WP you 
>probably haven't got your hands properly dirty with Solr yet.
>
>One way to use Tesseract would be via Apache Tika 
>https://wiki.apache.org/tika/TikaOCR which is an awesome library for 
>extracting plain text from many different document formats and types. 
>There's a direct way to use Tesseract from within Solr (the 
>ExtractingRequestHandler 
>https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html#uploading-data-with-solr-cell-using-apache-tika)
>
>but we don't generally recommend this, as dodgy files can sometimes eat
>
>all your resources during parsing and if Tika dies then so does Solr.
>We 
>usually process the files externally and the feed them to Solr using
>its 
>HTTP API.
>
>Here's one way to do it - a simple server wrapper around Tika 
>https://github.com/mattflax/dropwizard-tika-server written by my 
>colleague Matt Pearce.
>
>So you're going to need to do some coding I think - Python would be a 
>good choice - to feed your source files to Tika for OCR and extraction,
>
>and then the resulting text to Solr for indexing.
>
>Cheers
>
>Charlie
>
>> 
>> Thank you very much
>> 
>
>
>-- 
>Charlie Hull
>Flax - Open Source Enterprise Search
>
>tel/fax: +44 (0)8700 118334
>mobile:  +44 (0)7767 825828
>web: www.flax.co.uk

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Incomplete Index

2017-10-31 Thread Rick Leir

Dawg,
I have a similar setup, and this is what works for me. I have a field which 
contains a timestamp. The timestamp is set to be identical for all documents 
added/updated in a run. Whe the run is complete and some/many documents have 
been overwritten then I can delete all un-updated documents easily: they have a 
previous timestamp.
Cheers -- Rick


On October 31, 2017 7:54:22 AM EDT, "Emir Arnautović" 
 wrote:
>Hi,
>There is a possibility that you ended up with documents with the same
>ID and that you are overwriting docuements instead of writing new.
>
>In any case, I would suggest you change your approach in case you have
>enough disk space to keep two copies of indices:
>1. use alias to read data from index instead of index name
>2. index data into new index
>3. after verification (e.g. quick check would be number of docs) switch
>alias to new index
>4. keep old index available in case you need to switch back.
>5. before indexing next index, delete one from previous day to free up
>space.
>
>In case you have updates during day you have to account for that as
>well - stop updating while indexing new index; update both indices if
>want to be able to switch back at any point etc.
>
>HTH,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 31 Oct 2017, at 11:20, o1webdawg  wrote:
>> 
>> I have an index with about a million documents.  It is the backend
>for a
>> shopping cart system.
>> 
>> Sometimes the inventory gets out of sync with solr and the storefront
>> contains out of stock items.
>> 
>> So I setup a scheduled task on the server to run at 12am every
>morning to
>> delete the entire solr index.
>> 
>> Then at 12:04am I run another scheduled task to re-index the SQL
>database
>> containing the inventory.
>> 
>> Well, today I check it around 4am and only a fraction of the products
>are in
>> the solr index.
>> 
>> However, it did not seem to be idle and reloading it showed lots of
>deleted
>> documents.
>> 
>> 
>> I open up the core and the deletes keep going up, max docs goes up,
>but the
>> total docs stays the same.
>> 
>> It's really confusing me what is happening at this point and why I am
>> viewing these numbers of docs.
>> 
>> My theory is that the 12am delete is still running 5 hours later at
>the same
>> time as the re-indexing.
>> 
>> That's the only way I can explain this really odd behavior with my
>limited
>> knowledge.
>> 
>> Is my theory realistic and could the delete still be running?
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Some problems in SOLR-6.5.1

2017-10-25 Thread Rick Leir

Klin,
You need to use the new version's solrconfig.xml, with modifications as 
necessary. Start by looking at the current solrconfig, what was modified there?

Did you re-index? If you cannot reindex then you should upgrade to 5.n then to 
6.m.
Cheers -- Rick

On October 24, 2017 11:21:48 PM EDT, SOLR4189  wrote:
>Before two days we upgraded our SOLR servers from 4.10.1 version to
>6.5.1. We
>explored logs and saw too many errors like:
>
>1)
>org.apache.solr.common.SolrException;
>null:java.lang.NullPointerException
>  at
>org.apache.solr.search.grouping.distributed.responseprocessor.StoredFieldsShardResponseProcessor.process(StoredFieldsShardResponseProcessor.java:41)
>  at
>org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:771)
> . . .
>
>We don't know from which queries it throws.
>
>2) Second error or something strange that we saw in logs - sometimes
>SOLR
>service restarts automatically without any error
>
>Can somebody help to us? Does someone have problems like ours?
>
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: BlendedTermQuery for Solr?

2017-10-25 Thread Rick Leir

James
It looks as if Markus could help:
http://lucene.472066.n3.nabble.com/BlendedTermQuery-causing-negative-IDF-td4271289.html

Also, ES has a query. You could look at the source there.
"BlendedTermQuery forms the guts behind Elasticsearch’s cross_field search. -- 
Doug Turnbull

Cheers -- Rick

On October 25, 2017 2:11:39 AM EDT, James  wrote:
> 
>
>On my Solr 6.6 server I'd like to use BlendedTermQuery.
>
>https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/BlendedTe
>rmQuery.html
>
> 
>
>I know it is a Lucene class. Is there a Solr API available to access
>it? If
>not, maybe some workaround?
>
> 
>
>Thanks!

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Upload/update full schema and solrconfig in standalone mode

2017-10-21 Thread Rick Leir

Alessandro,
Scp is "secure cp" and is a part of the ssh service. So if you have ssh access 
then you can use scp. From Windows you would be using winscp. Many hosts 
provide this but not all. 

If you send files to the ops staff then they can coordinate the restart and any 
fallback planning, so that might be the best bet.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Upload/update full schema and solrconfig in standalone mode

2017-10-20 Thread Rick Leir

Alessandro
First, let me say that the whole idea makes me nervous.
1/ are you better off with scp? I would not want to do this via Solr API
2/ the right way to do this is with Ansible, Puppet or Docker, 
3/ would you like to update a 'QA' installation, test it, then flip it into 
production? Cheers -- Rick

On October 20, 2017 8:49:14 AM EDT, Alessandro Hoss  wrote:
>Hello,
>
>Is it possible to upload the entire schema and solrconfig.xml to a Solr
>running on standalone mode?
>
>I know about the Config API
>, but it
>allows
>only add or modify solrconfig properties, and what I want is to change
>the
>whole config (schema and solrconfig) to ensure it's up to date.
>
>What I need is something similar to the Configsets API
>, where
>I'm
>able to upload a zip containing both schema and solrconfig.xml, but
>unfortunately it's SolrCloud only.
>
>Is there a way of doing that in standalone mode?
>
>Thanks in advance.
>Alessandro Hoss

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Goal: reverse chronological display Methods? (1) boost, and/or (2) disable idf

2017-10-20 Thread Rick Leir

Bill,
In the debug score calculations, the bf boosting does not appear at all. I 
would expect it to at least show up with a small value. So maybe we need to 
look at the query. 
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread Rick Leir

Maximka
The app server is bundled in Solr, so you do not install Tomcat or JEtty 
separately. 
Cheers -- Rick

On October 19, 2017 2:01:30 AM EDT, maximka19  wrote:
>Hi everyone!
>
>I was looking for full-text search engine and chosen Solr. Quickly
>introduced with Solr. Now I'm having troubles with taking Solr to
>Production
>under Windows Server.
>
>As You know, from Solr 5 there is no .WAR-file in package; I couldn't
>deploy
>Solr 7.1 to Tomcat 9. Didn't found any information, tutorials, guides
>relevantly to new versions of both Solr and Tomcat.
>
>So, the first question that comes: do I need to use default Jetty
>container
>in production? Or Tomcat is more preferable in production ways? If so,
>why?
>For what reasons? In older (and the only) books about Solr I've read
>the
>Tomcat is more efficient in production that default Jetty. Book were
>considering Solr 3 and Tomcat 6. Nowadays versions are much higher. If
>we
>can use Jetty in production, how to deploy Solr with Jetty as a service
>in
>Windows Server? There are no scripts provided for Windows users, only
>for
>.NIX-users.
>
>Troubling with this question for two weeks, really. There are NO
>relevant
>information in such questions, even in official documentation. And the
>other
>thing: do Solr users to know smth else about deployin Solr to
>production?
>Any bugs, recommendations, best practices? Or everything goes
>out-of-the-box? 
>
>
>I really need help, advices and guides in this question.
>Thank You
>
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-15 Thread Rick Leir

Thanks Florian, Jan!
The unix way (starting 40 years ago) was small programs, working together via 
pipes and now services. Maybe Solr should not run executables, leave that task 
to ssh. The security conscious folks would probably 'prefer' that we take that 
feature out of Solr.
Cheers -- Rick

On October 15, 2017 10:52:15 AM EDT, "Jan Høydahl" <jan@cominvent.com> 
wrote:
>I think Config API came in 5.0 through
>https://issues.apache.org/jira/browse/SOLR-6533
><https://issues.apache.org/jira/browse/SOLR-6533>
>
>--
>Jan Høydahl, search solution architect
>Cominvent AS - www.cominvent.com
>
>> 15. okt. 2017 kl. 15:29 skrev Florian Gleixner <f...@redflo.de>:
>> 
>> On 13.10.2017 15:13, Rick Leir wrote:
>>> Hi all,
>>> What is the earliest version which was vulnerable?
>>> Thanks -- Rick
>>> 
>> 
>> As far as i can understand, to exploit both vulnerabilities, you need
>> Solr 5.1 or above (xml query parser), but the RunExecutableListener
>was
>> also present in Solr 3.X. But i dont know when the config api was
>> introduced.
>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Rick Leir


On 2017-10-13 04:19 PM, Kevin Layer wrote:

Amrit Sarkar wrote:


Kevin,

fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.

OK, thanks.  Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).

Thank you for all the help, though.  I appreciate it.

Ha, these messages crash my android mail client!  Now...

Did you try Nutch? Or the Narconex HTTP crawler? Tika? Or any Python 
crawler, posting its documents to th Solr API.

cheers -- Rick

Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-13 Thread Rick Leir

Hi all,
What is the earliest version which was vulnerable?
Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr related questions

2017-10-13 Thread Rick Leir

1/ the _version_ field is necessary.
2/ there is a Solr api for editing the manged schema
3/ not having used solrnet, I suspect you can bypass it and use the solr REST 
api directly.
Cheers -- Rick

On October 13, 2017 5:40:26 AM EDT, startrekfan  
wrote:
>Hello,
>
>I have some Solr related questions:
>
>1.) I created a core and tried to simplify the managed-schema file. But
>if
>I remove all "unecessary" fields/fieldtypes, I get errors like: field
>"_version_" is missing, type "boolean" is missing and so on. Why do I
>have
>to define this types/fields? Which fields/fieldtypes are required?
>
>2.) Can I modify the managed-schema remotly/by program e.g. with a post
>request or only by editing the managed-schema file directly?
>
>3.) When I have a service(solrnet client) that pushes a file from a
>fileserver to solr, will it cause two times traffic? (from the
>fileserver
>to my service and from the service to solr?) Is there a chance to index
>the
>file direct? (I need to add additional attributes to the index
>document)
>
>Thank you

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Appending fields to pre-existed document

2017-10-13 Thread Rick Leir

Hi
Show us the solr version,  field types, the handler definition, and the query 
you send. Any log entries?
Cheers -- Rick

On October 13, 2017 5:57:16 AM EDT, "Игорь Абрашин"  
wrote:
>Hello, solr community.
>We are getting strugled with updating already existing docs. For
>instance,
>we got indexed one jpg with tika parser and got batch of attributes.
>Then
>we want to index database datasource and append those fields to our
>document with the same uniquekey, stored at schema.xml. And all what we
>got
>only a overwriting doc came first by new one. Ok just put
>overwrite=false
>to params, and dublicating docs appeare. So do you have some clues or
>suggesstions related to that. How to append one batch of attribute to
>another? Or maybe how to merge them after duplicate was created?

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: book on solr

2017-10-13 Thread Rick Leir

Jay, get info on this with a search: 
https://www.google.ca/search?q=solr+shard+size


cheers -- Rick

On 2017-10-13 01:42 AM, Jay Potharaju wrote:

Any blog or documentation also that would provide some basic rules or
guidelines for scaling would also be great.

Thanks
Jay Potharaju

Re: ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Rick Leir


Ascot,

At the risk of ...   Can you disable Kerberos in Hbase? If not, then you 
will have to provide a password!


Rick


On 2017-10-04 07:32 PM, Ascot Moss wrote:

Does anyone use hbase indexer in index kerberos Hbase to solr?

Pls help!

On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss  wrote:


Hi,

I am trying to use hbase-indexer to index hbase table to Solr,

Solr 6.6
Hbase-Indexer 1.6
Hbase 1.2.5 with Kerberos enabled,


After putting new test rows into the Hbase table, I got the following
error from hbase-indexer thus it cannot write the data to solr :

WARN ipc.AbstractRpcClient: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused
by GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]

ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely
cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChalleng
e(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConn
ect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSa
slConnection(RpcClientImpl.java:609)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$
600(RpcClientImpl.java:154)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
RpcClientImpl.java:735)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
RpcClientImpl.java:732)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
upInformation.java:1698)
at


Any idea how to resolve this issue?

Regards

Re: Time to Load a Solr Core with Hdfs Directory Factory

2017-10-04 Thread Rick Leir


Shashank,

I had a quick look at:

https://lucene.apache.org/solr/guide/6_6/running-solr-on-hdfs.html

Did you enable the Block Cache and the solr.hdfs.nrtcachingdirectory?
cheers -- Rick

On 2017-10-03 09:22 PM, Shashank Pedamallu wrote:

Hi,

I’m trying an experiment in which, I’m loading a core of 1.27GB with 5621600 
documents on 2 Solr setups. On first setup, dataDir is pointed as a 
NRTCachingDirectory as a standard path in my local. On second setup, it is 
pointed as HdfsDirectory. As part of loading the core, I see following log:

2017-10-04 01:07:50.102 UTC INFO  
(searcherExecutor-12-thread-1-processing-x:staging_1gb-core-1) 
[core='x:staging_1gb-core-1'] org.apache.solr.core.SolrCore@2247 
[staging_1gb-core-1] Registered new searcher 
Searcher@10fe9415[staging_1gb-core-1] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_12bk(6.4.2):C2886542)
 Uninverting(_1eu5(6.4.2):C743800) Uninverting(_18kl(6.4.2):c331485) 
Uninverting(_1lt0(6.4.2):c284012) Uninverting(_1xx5(6.4.2):C654477) 
Uninverting(_1qsg(6.4.2):C658237) Uninverting(_1qf4(6.4.2):c24903) 
Uninverting(_1xwv(6.4.2):c16734) Uninverting(_1xwb(6.4.2):c1) 
Uninverting(_1xww(6.4.2):C174) Uninverting(_1xy9(6.4.2):c878) 
Uninverting(_1xxf(6.4.2):c354) Uninverting(_1xxp(6.4.2):c508) 
Uninverting(_1xx6(6.4.2):C150) Uninverting(_1xxz(6.4.2):c545) 
Uninverting(_1xxg(6.4.2):C190) Uninverting(_1xyj(6.4.2):c690) 
Uninverting(_1xyd(6.4.2):C144)))}

This step takes about 132 milli-secods in setup 1 (i.e., with 
NRTCachingDirectoryFactory).
The same step takes about 21 minutes on second setup (i.e., with 
HdfsDirectoryFactory).

Does the load time of a Solr Core drops so badly on a HdfsFileSystem? Is this 
expected?

Thanks,
Shashank



.

Re: SOLR terminology

2017-09-28 Thread Rick Leir


Gunalan,

Solr Core (core), is one-to-one with a Solr process and its data directory. It 
can be a shard, or part of a replica.
Collection - is one or more shards grouped together, and can be replicated for 
reliability, availability and performance
Node - is a machine in a Zookeeper group
SolrCluster - is a Zookeeper group of nodes

cheers -- Rick

On 2017-09-27 10:27 PM, Gunalan V wrote:

Hello,

Could someone please tell me the difference between Solr Core (core),
Collections, Nodes, SolrCluster referred in SolrColud. It's bit confusing.

If there are any diagrammatic representation or example please share me.


Thanks!

Re: Replicates not recovering after rolling restart

2017-09-22 Thread Rick Leir


Wunder, Erick

$ dc
16o
1578578283947098112p
15E83C95E8D0

That is an interesting number. Is it, as a guess, machine instructions 
or an address pointer? It does not look like UTF-8 or ASCII. Machine 
code looks promising:



Disassembly:

0:  15 e8 3c 95 e8  adceax,0xe8953ce8
5:  d0 00   rolBYTE PTR [rax],1


/ADC/dest,src Modifies flags: AF CF OF SF PF ZF Sums two binary operands 
placing the result in the destination.


*ROL - Rotate Left*

Registers: the/64-bit/extension of/eax/is called/rax/.

Is that code possibly in the JVM executable? Or a random memory page.

cheers -- Rick

On 2017-09-20 07:21 PM, Walter Underwood wrote:

1578578283947098112 needs 61 bits. Is it being parsed into a 32 bit target?

That doesn’t explain where it came from, of course.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Sep 20, 2017, at 3:35 PM, Erick Erickson  wrote:

The numberformatexception is...odd. Clearly that's too big a number
for an integer, did anything in the underlying schema change?

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  wrote:

Rolling restarts work fine for us. I often include installing new configs with 
that. Here is our script. Pass it any hostname in the cluster. I use the load 
balancer name. You’ll need to change the domain and the install directory of 
course.

#!/bin/bash

cluster=$1

hosts=`curl -s 
"http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; | 
jq -r '.cluster.live_nodes[]' | sort`

for host in $hosts
do
host="${host}.cloud.cheggnet.com"
echo restarting Solr on $host
ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin bin/solr 
start -cloud -h `hostname`'
done


Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:

Hello,


Background:


We have been successfully using Solr for over 5 years and we recently made the 
decision to move into SolrCloud. For the most part that has been easy but we 
have repeated problems with our rolling restart were server remain functional 
but stay in Recovery until they stop trying. We restarted because we increased 
the memory from 12GB to 16GB on the JVM.


Does anyone have any insight as to what is going on here?

Is there a special procedure I should use for starting a stopping host?

Is it ok to do a rolling restart on all the nodes in s shard?


Any insight would be appreciated.


Configuration:


We have a group of servers with multiple collections. Each collection consist 
of one shard and multiple replicates. We are running the latest stable version 
of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java HotSpot(TM) 64-Bit 
Server VM 1.8.0_66 25.66-b17


(collection)  (shard)  (replicates)

journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
solr-222 (replicates)


Problem:


Restarting the system puts the replicates in a recovery state they never exit 
from. They eventually give up after 500 tries.  If I go to the individual 
replicates and execute a query the data is still available.


Using tcpdump I find the replicates sending this request to the leader (the 
leader appears to be active).


The exchange goes  like this - :


solr-220 is the leader.

Solr-221 to Solr-220


10:18:42.426823 IP solr-221:54341 > solr-220:8983:


POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
 1.0
Content-Length: 108
Host: solr-220:8983
Connection: Keep-Alive


commit_end_point=true=false=true=false=true=javabin=2


Solr-220 back to Solr-221


IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
options [nop,nop,
TS val 85813 ecr 858107069], length 5151
..HTTP/1.1 500 Server Error
Content-Type: application/octet-stream
Content-Length: 5060


.responseHeader..%QTimeC.%error..#msg?.For input string: 
"1578578283947098112".%trace?.: For
input string: "1578578283947098112"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
at 
org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
at 
org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at

1 2 3 >

1 - 100 of 266 matches

Mail list logo