Looking for some guide on how to do the index on a folder of data in Solr 7.2

2019-09-18 Thread Raymond Xie
I remember there is a post.jar in Cloudera's Solr (very old version) that
allows indexing doc like:

java -Dtype=application/json -Drecursive -Durl="
http://localhost:8983/solr/indexer_odac/update/json/docs; -jar post.jar
/tmp/solr_data/data

I don't see the post.jar in Solr 7.2 anymore, it is just "post", not
"post.jar"

Can you tell me what is the right command to do the index on a folder? I am
not able to find that information in the documentation.

Thank you very much.

On Tue, Sep 17, 2019 at 9:42 AM Raymond Xie  wrote:

> Thank you Paras for your reply, yes I downloaded src, after re-download
> the binary, it is working as expected here.
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
>
> On Tue, Sep 17, 2019 at 9:26 AM Paras Lehana 
> wrote:
>
>> Hi Raymond,
>>
>> ERROR: start.jar file not found in /opt/solr-8.2.0/solr/server!
>>>
>>
>> You had probably downloaded the source version. *Download the binary one*
>> (TGZ
>> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.2.0/solr-8.2.0.tgz>
>> or ZIP
>> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.2.0/solr-8.2.0.zip>).
>> Yes, it does mean that the package is incomplete.
>>
>> On Tue, 17 Sep 2019 at 18:40, Raymond Xie  wrote:
>>
>>> Thank you Paras:
>>>
>>> If I am already user of root, and still run sudo? no, it doesn't work:
>>> [root@pocnnr1n1 solr]# sudo bin/solr start -force
>>> sudo: bin/solr: command not found
>>>
>>> [root@pocnnr1n1 solr]# ls -l bin/solr
>>> -rw-r--r-- 1 root root 80630 Jul 19 09:09 bin/solr
>>>
>>> So, I followed your suggestion and added +x, now run the command again:
>>> [root@pocnnr1n1 solr]# bin/solr start
>>> *** [WARN] *** Your open file limit is currently 1024.
>>>  It should be set to 65000 to avoid operational disruption.
>>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>>> false in your profile or solr.in.sh
>>> *** [WARN] ***  Your Max Processes Limit is currently 63397.
>>>  It should be set to 65000 to avoid operational disruption.
>>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>>> false in your profile or solr.in.sh
>>> WARNING: Starting Solr as the root user is a security risk and not
>>> considered best practice. Exiting.
>>>  Please consult the Reference Guide. To override this check,
>>> start with argument '-force'
>>>
>>> alright, I'll add the -force option:
>>> [root@pocnnr1n1 solr]# bin/solr start -force
>>> *** [WARN] *** Your open file limit is currently 1024.
>>>  It should be set to 65000 to avoid operational disruption.
>>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>>> false in your profile or solr.in.sh
>>> *** [WARN] ***  Your Max Processes Limit is currently 63397.
>>>  It should be set to 65000 to avoid operational disruption.
>>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>>> false in your profile or solr.in.sh
>>>
>>> ERROR: start.jar file not found in /opt/solr-8.2.0/solr/server!
>>> Please check your -d parameter to set the correct Solr server directory.
>>>
>>> BUT: the folder server is there, and no start.jar
>>>
>>> [root@pocnnr1n1 solr]# ls /opt/solr-8.2.0/solr
>>> bin  common-build.xml  licensesserver
>>> webapp
>>> bin-test contrib   LICENSE.txt site
>>> build.xmlcore  LUCENE_CHANGES.txt  solrj
>>> CHANGES.txt  docs  NOTICE.txt  solr-ref-guide
>>> cloud-devexample   README.txt  test-framework
>>>
>>> [root@pocnnr1n1 server]# ll
>>> total 16
>>> drwxr-xr-x 2 root root6 Jul 19 09:09 solr-webapp
>>> drwxr-xr-x 3 root root   27 Jul 19 09:09 scripts
>>> -rw-r--r-- 1 root root 3959 Jul 19 09:09 README.txt
>>> -rw-r--r-- 1 root root 5740 Jul 19 09:09 ivy.xml
>>> -rw-r--r-- 1 root root 2214 Jul 19 09:09 build.xml
>>> drwxr-xr-x 2 root root  135 Sep 16 21:13 etc
>>> drwxr-xr-x 2 root root   36 Sep 16 21:13 contexts
>>> drwxr-xr-x 2 root root   82 Sep 16 21:13 resources
>>> drwxr-xr-x 2 root root   90 Sep 16 21:13 modules
>>> drwxr-xr-x 3 root root   73 Sep 16 21:13 solr
>>>
>>> *So this seems to be something missing from the solr package?*
>>>
>>> *Thank you.*
>>>
>>> *---

Re: Why I receive permission denied when running as root

2019-09-17 Thread Raymond Xie
Thank you Paras for your reply, yes I downloaded src, after re-download the
binary, it is working as expected here.

**
*Sincerely yours,*


*Raymond*


On Tue, Sep 17, 2019 at 9:26 AM Paras Lehana 
wrote:

> Hi Raymond,
>
> ERROR: start.jar file not found in /opt/solr-8.2.0/solr/server!
>>
>
> You had probably downloaded the source version. *Download the binary one*
> (TGZ
> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.2.0/solr-8.2.0.tgz>
> or ZIP
> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.2.0/solr-8.2.0.zip>).
> Yes, it does mean that the package is incomplete.
>
> On Tue, 17 Sep 2019 at 18:40, Raymond Xie  wrote:
>
>> Thank you Paras:
>>
>> If I am already user of root, and still run sudo? no, it doesn't work:
>> [root@pocnnr1n1 solr]# sudo bin/solr start -force
>> sudo: bin/solr: command not found
>>
>> [root@pocnnr1n1 solr]# ls -l bin/solr
>> -rw-r--r-- 1 root root 80630 Jul 19 09:09 bin/solr
>>
>> So, I followed your suggestion and added +x, now run the command again:
>> [root@pocnnr1n1 solr]# bin/solr start
>> *** [WARN] *** Your open file limit is currently 1024.
>>  It should be set to 65000 to avoid operational disruption.
>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>> false in your profile or solr.in.sh
>> *** [WARN] ***  Your Max Processes Limit is currently 63397.
>>  It should be set to 65000 to avoid operational disruption.
>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>> false in your profile or solr.in.sh
>> WARNING: Starting Solr as the root user is a security risk and not
>> considered best practice. Exiting.
>>  Please consult the Reference Guide. To override this check,
>> start with argument '-force'
>>
>> alright, I'll add the -force option:
>> [root@pocnnr1n1 solr]# bin/solr start -force
>> *** [WARN] *** Your open file limit is currently 1024.
>>  It should be set to 65000 to avoid operational disruption.
>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>> false in your profile or solr.in.sh
>> *** [WARN] ***  Your Max Processes Limit is currently 63397.
>>  It should be set to 65000 to avoid operational disruption.
>>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
>> false in your profile or solr.in.sh
>>
>> ERROR: start.jar file not found in /opt/solr-8.2.0/solr/server!
>> Please check your -d parameter to set the correct Solr server directory.
>>
>> BUT: the folder server is there, and no start.jar
>>
>> [root@pocnnr1n1 solr]# ls /opt/solr-8.2.0/solr
>> bin  common-build.xml  licensesserver  webapp
>> bin-test contrib   LICENSE.txt site
>> build.xmlcore  LUCENE_CHANGES.txt  solrj
>> CHANGES.txt  docs  NOTICE.txt  solr-ref-guide
>> cloud-devexample   README.txt  test-framework
>>
>> [root@pocnnr1n1 server]# ll
>> total 16
>> drwxr-xr-x 2 root root6 Jul 19 09:09 solr-webapp
>> drwxr-xr-x 3 root root   27 Jul 19 09:09 scripts
>> -rw-r--r-- 1 root root 3959 Jul 19 09:09 README.txt
>> -rw-r--r-- 1 root root 5740 Jul 19 09:09 ivy.xml
>> -rw-r--r-- 1 root root 2214 Jul 19 09:09 build.xml
>> drwxr-xr-x 2 root root  135 Sep 16 21:13 etc
>> drwxr-xr-x 2 root root   36 Sep 16 21:13 contexts
>> drwxr-xr-x 2 root root   82 Sep 16 21:13 resources
>> drwxr-xr-x 2 root root   90 Sep 16 21:13 modules
>> drwxr-xr-x 3 root root   73 Sep 16 21:13 solr
>>
>> *So this seems to be something missing from the solr package?*
>>
>> *Thank you.*
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>>
>> On Tue, Sep 17, 2019 at 8:42 AM Paras Lehana 
>> wrote:
>>
>>> Hey Raymond,
>>>
>>>
>>> bin/solr start -force
>>>
>>>
>>> I know this could be useless, but did you try the following (notice the
>>> sudo)?
>>>
>>> *sudo* bin/solr start -force
>>>
>>>
>>> Also, I suggest you to review permission of solr file by:
>>>
>>> ls -l bin/solr
>>>
>>>
>>> If you don't see required executable permissions, you can provide root
>>> user the permission to execute the file by:
>>>
>>> *chmod +x bin/solr*
>>>
>>>
>>> Hope this helps.
>>>
>>> On Tue, 17 Sep 2019 at 18:03, Ra

Re: Why I receive permission denied when running as root

2019-09-17 Thread Raymond Xie
Thank you Paras:

If I am already user of root, and still run sudo? no, it doesn't work:
[root@pocnnr1n1 solr]# sudo bin/solr start -force
sudo: bin/solr: command not found

[root@pocnnr1n1 solr]# ls -l bin/solr
-rw-r--r-- 1 root root 80630 Jul 19 09:09 bin/solr

So, I followed your suggestion and added +x, now run the command again:
[root@pocnnr1n1 solr]# bin/solr start
*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
in your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 63397.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
in your profile or solr.in.sh
WARNING: Starting Solr as the root user is a security risk and not
considered best practice. Exiting.
 Please consult the Reference Guide. To override this check, start
with argument '-force'

alright, I'll add the -force option:
[root@pocnnr1n1 solr]# bin/solr start -force
*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
in your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 63397.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
in your profile or solr.in.sh

ERROR: start.jar file not found in /opt/solr-8.2.0/solr/server!
Please check your -d parameter to set the correct Solr server directory.

BUT: the folder server is there, and no start.jar

[root@pocnnr1n1 solr]# ls /opt/solr-8.2.0/solr
bin  common-build.xml  licensesserver  webapp
bin-test contrib   LICENSE.txt site
build.xmlcore  LUCENE_CHANGES.txt  solrj
CHANGES.txt  docs  NOTICE.txt  solr-ref-guide
cloud-devexample   README.txt  test-framework

[root@pocnnr1n1 server]# ll
total 16
drwxr-xr-x 2 root root6 Jul 19 09:09 solr-webapp
drwxr-xr-x 3 root root   27 Jul 19 09:09 scripts
-rw-r--r-- 1 root root 3959 Jul 19 09:09 README.txt
-rw-r--r-- 1 root root 5740 Jul 19 09:09 ivy.xml
-rw-r--r-- 1 root root 2214 Jul 19 09:09 build.xml
drwxr-xr-x 2 root root  135 Sep 16 21:13 etc
drwxr-xr-x 2 root root   36 Sep 16 21:13 contexts
drwxr-xr-x 2 root root   82 Sep 16 21:13 resources
drwxr-xr-x 2 root root   90 Sep 16 21:13 modules
drwxr-xr-x 3 root root   73 Sep 16 21:13 solr

*So this seems to be something missing from the solr package?*

*Thank you.*

**
*Sincerely yours,*


*Raymond*


On Tue, Sep 17, 2019 at 8:42 AM Paras Lehana 
wrote:

> Hey Raymond,
>
>
> bin/solr start -force
>
>
> I know this could be useless, but did you try the following (notice the
> sudo)?
>
> *sudo* bin/solr start -force
>
>
> Also, I suggest you to review permission of solr file by:
>
> ls -l bin/solr
>
>
> If you don't see required executable permissions, you can provide root
> user the permission to execute the file by:
>
> *chmod +x bin/solr*
>
>
> Hope this helps.
>
> On Tue, 17 Sep 2019 at 18:03, Raymond Xie  wrote:
>
>> Thank you.
>>
>> As I suspected, this is something else. It prompts me the same error here:
>> [root@pocnnr1n1 solr]# bin/solr start -force
>> -bash: bin/solr: Permission denied
>> [root@pocnnr1n1 solr]#
>>
>> It is not a good practice to use root directly, however, using root
>> should not have permission error. This is odd.
>>
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>>
>> On Tue, Sep 17, 2019 at 12:49 AM Paras Lehana 
>> wrote:
>>
>>> Hi Raymond,
>>>
>>> It's not recommended to run solr as root (security reasons
>>> <https://lucene.apache.org/solr/guide/8_1/taking-solr-to-production.html#create-the-solr-user>).
>>> Nevertheless, in order to answer your question, try this command:
>>>
>>> *sudo bin/solr start -force*
>>>
>>>
>>> From Solr Control Script Reference
>>> <https://lucene.apache.org/solr/guide/8_1/solr-control-script-reference.html#start-parameters>
>>> :
>>>
>>> If attempting to start Solr as the root user, the script will exit with
>>>> a warning that running Solr as "root" can cause problems. It is possible to
>>>> override this warning with the -force parameter.
>>>
>>>
>>> PS: I suggest you to go through *Service Installation Script *in Taking
>>> Solr 

Re: Why I receive permission denied when running as root

2019-09-17 Thread Raymond Xie
Thank you.

As I suspected, this is something else. It prompts me the same error here:
[root@pocnnr1n1 solr]# bin/solr start -force
-bash: bin/solr: Permission denied
[root@pocnnr1n1 solr]#

It is not a good practice to use root directly, however, using root should
not have permission error. This is odd.


**
*Sincerely yours,*


*Raymond*


On Tue, Sep 17, 2019 at 12:49 AM Paras Lehana 
wrote:

> Hi Raymond,
>
> It's not recommended to run solr as root (security reasons
> <https://lucene.apache.org/solr/guide/8_1/taking-solr-to-production.html#create-the-solr-user>).
> Nevertheless, in order to answer your question, try this command:
>
> *sudo bin/solr start -force*
>
>
> From Solr Control Script Reference
> <https://lucene.apache.org/solr/guide/8_1/solr-control-script-reference.html#start-parameters>
> :
>
> If attempting to start Solr as the root user, the script will exit with a
>> warning that running Solr as "root" can cause problems. It is possible to
>> override this warning with the -force parameter.
>
>
> PS: I suggest you to go through *Service Installation Script *in Taking
> Solr to Production
> <https://lucene.apache.org/solr/guide/8_1/taking-solr-to-production.html> that
> describes the ideal Solr setup. A must read!
>
> Hope I helped.
>
>
> On Tue, 17 Sep 2019 at 08:37, Raymond Xie  wrote:
>
>> [root@pocnnr1n1 solr-8.2.0]# ll
>> total 88
>> -rw-r--r--  1 root root  4023 Jul 19 09:09 README.md
>> -rw-r--r--  1 root root 32153 Jul 19 09:09 build.xml
>> -rw-r--r--  1 root root 27717 Jul 19 09:09 NOTICE.txt
>> -rw-r--r--  1 root root 12646 Jul 19 09:09 LICENSE.txt
>> drwxr-xr-x 10 root root   174 Sep 16 21:13 dev-tools
>> drwxr-xr-x 30 root root  4096 Sep 16 21:13 lucene
>> drwxr-xr-x 16 root root  4096 Sep 16 21:13 solr
>>
>> [root@pocnnr1n1 solr]# bin/solr start
>> -bash: bin/solr: Permission denied
>>
>> Can anyone please help me to sort it out?
>>
>> Thank you. Regards.
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Software Programmer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>


Why I receive permission denied when running as root

2019-09-16 Thread Raymond Xie
[root@pocnnr1n1 solr-8.2.0]# ll
total 88
-rw-r--r--  1 root root  4023 Jul 19 09:09 README.md
-rw-r--r--  1 root root 32153 Jul 19 09:09 build.xml
-rw-r--r--  1 root root 27717 Jul 19 09:09 NOTICE.txt
-rw-r--r--  1 root root 12646 Jul 19 09:09 LICENSE.txt
drwxr-xr-x 10 root root   174 Sep 16 21:13 dev-tools
drwxr-xr-x 30 root root  4096 Sep 16 21:13 lucene
drwxr-xr-x 16 root root  4096 Sep 16 21:13 solr

[root@pocnnr1n1 solr]# bin/solr start
-bash: bin/solr: Permission denied

Can anyone please help me to sort it out?

Thank you. Regards.
**
*Sincerely yours,*


*Raymond*


Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Raymond Xie
Thank you all for the suggestions. I'm now tending to not using a
traditional parallel indexing my data are json files with meta data
extracted from raw data received and archived into our data server cluster.
Those data come in various flows and reside in their respective folders,
splitting them might introduce unnecessary extra work and could end up with
trouble. So instead of that, maybe it would be easier to simply schedule
multiple indexing jobs separately.?

Thanks.

Raymond


Rahul Singh <rahul.xavier.si...@gmail.com> 于 2018年5月24日周四 上午11:23写道:

> Resending to list to help more people..
>
> This is an architectural pattern to solve the same issue that arises over
> and over again.. The queue can be anything — a table in a database, even a
> collection solr.
>
> And yes I have implemented it —  I did it in C# before using a SQL Server
> table based queue -- (http://github.com/appleseed/search-stack) — and
> then made the indexer be able to write to lucene, elastic or solr depending
> config. Im not actively maintaining this right now ,but will consider
> porting it to Kafka + Spark + Kafka Connect based system when I find time.
>
> In Kafka however, you have a lot of potential with Kafka Connect . Here is
> an example using Cassandra..
> But the premise is the same Kafka Connect has libraries of connectors for
> different source / sinks … may not work for files but for pure raw data,
> Kafka Connect is good.
>
> Here’s a project that may guide you best.
>
>
> http://saumitra.me/blog/tweet-search-and-analysis-with-kafka-solr-cassandra/
>
> I dont know where this guys code went.. but the content is there with code
> samples.
>
>
>
>
> --
>
> On May 23, 2018, 8:37 PM -0500, Raymond Xie <xie3208...@gmail.com>, wrote:
>
> Thank you Rahul despite that's very high level.
>
> With no offense, do you have a successful implementation or it is just
> your unproven idea? I never used Rabbit nor Kafka before but would be very
> interested in knowing more detail on the Kafka idea as Kafka is available
> in my environment.
>
> Thank you again and look forward to hearing more from you or anyone in
> this Solr community.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Wed, May 23, 2018 at 8:15 AM, Rahul Singh <rahul.xavier.si...@gmail.com
> > wrote:
>
>> Enumerate the file locations (map) , put them in a queue like rabbit or
>> Kafka (Persist the map), have a bunch of threads , workers, containers,
>> whatever pop off the queue , process the item (reduce).
>>
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208...@gmail.com>,
>> wrote:
>>
>> I know how to do indexing on file system like single file or folder, but
>> how do I do that in a parallel way? The data I need to index is of huge
>> volume and can't be put on HDFS.
>>
>> Thank you
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>>
>


Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Raymond Xie
Thank you Rahul despite that's very high level.

With no offense, do you have a successful implementation or it is just your
unproven idea? I never used Rabbit nor Kafka before but would be very
interested in knowing more detail on the Kafka idea as Kafka is available
in my environment.

Thank you again and look forward to hearing more from you or anyone in this
Solr community.


**
*Sincerely yours,*


*Raymond*

On Wed, May 23, 2018 at 8:15 AM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> Enumerate the file locations (map) , put them in a queue like rabbit or
> Kafka (Persist the map), have a bunch of threads , workers, containers,
> whatever pop off the queue , process the item (reduce).
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208...@gmail.com>, wrote:
>
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
>


Re: Index filename while indexing JSON file

2018-05-20 Thread Raymond Xie
would you consider to include the filename as another meta data fields for
being indexed? I think your downstream python can do that easily.


**
*Sincerely yours,*


*Raymond*

On Fri, May 18, 2018 at 3:47 PM, S.Ashwath  wrote:

> Hello,
>
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 9 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.
>
> Regards,
>
> Ash
>


How to do parallel indexing on files (not on HDFS)

2018-05-20 Thread Raymond Xie
I know how to do indexing on file system like single file or folder, but
how do I do that in a parallel way? The data I need to index is of huge
volume and can't be put on HDFS.

Thank you

**
*Sincerely yours,*


*Raymond*


Multi threading indexing

2018-05-13 Thread Raymond Xie
Hello,

I have a huge amount of data (TB level) to be indexed, I am wondering if
anyone can share your idea/code to do the multithreading indexing?

**
*Sincerely yours,*


*Raymond*


How to do multi-threading indexing on huge volume of JSON files?

2018-05-08 Thread Raymond Xie
I have a huge amount of JSON files to be indexed in Solr, it costs me 22
minutes to index 300,000 JSON files which were generated from 1 single bz2
file, this is only 0.25% of the total amount of data from the same business
flow, there are 100+ business flow to be index'ed.

I absolutely need a good solution on this, at the moment I use the post.jar
to work on folder and I am running the post.jar in single thread.

I wonder what is the best practice to do multi-threading indexing? Can
anyone provide detailed example?



**
*Sincerely yours,*


*Raymond*


How to do indexing on remote location

2018-05-08 Thread Raymond Xie
Please take this as no joking! Any suggestion is welcome and appreciated.

I have data on remote WORM drive on a cluster that include 3 hosts, each
host contains same copy of data.

I have Solr server on a different host and need to do the indexing on the
WORM drive.

It is said the indexing can only be done on local host, or hdfs if in the
same cluster.

I was proposing to create a mapped drive/mount so Solr server would see the
WORM drive as its local location.

The proposal was returned today by management saying cross mount
potentially introduces risk and I was asked to figure out a workaround to
do the indexing on a remote host without the cross mount.


Thank you very much.

**
*Sincerely yours,*


*Raymond*


Re: How to create a solr collection providing as much searching flexibility as possible?

2018-04-29 Thread Raymond Xie
Thank you Alessandro,

It looks like my requirement is vague, but indeed I already indicated my
data is in FIX format, which is a  format, here is an example in
the Wiki link in my original question:

8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS |
52=20071123-05:30:00.000 | 11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E |
55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY
TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |

As the data format is quite special, and commonly used in Financial area
(especially for trading data), I believe there must have been lots of
studies already made. That's why I want to find out.

Thank you.




**
*Sincerely yours,*


*Raymond*

On Sat, Apr 28, 2018 at 11:32 AM, Alessandro Benedetti  wrote:

> Hi Raymond,
> your requirements are quite vague, Solr offers you those capabilities but
> you need to model your configuration and data accordingly.
>
> https://lucene.apache.org/solr/guide/7_3/solr-tutorial.html
> is a good starting point.
> After that you can study your requirements and design the search solution
> accordingly.
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


How to create a solr collection providing as much searching flexibility as possible?

2018-04-27 Thread Raymond Xie
I have huge amount of data in FIX format (
https://en.wikipedia.org/wiki/Financial_Information_eXchange)

I want to give the data users the most flexibility to do their search,
usually like trading date range, order id or type, amount, 

Can anyone share any experience on that?

Thanks.




**
*Sincerely yours,*


*Raymond*


adding document to collection failed in Solr cloud mode

2018-04-14 Thread Raymond Xie
Hello,

I need to add all documents in a folder to collection and it failed:

Here is my command:

hostname: mysolr
Solr Admin URL: http://mysolr.net:8983/solr/#/
Collection name: collection_indexer
Collection url:
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1
data folder:

/tmp/solr_data
Running folder:

bash-4.1$ pwd
/opt/cloudera/parcels/CDH/jars
command:

java -Dtype=application/json -Drecursive -Durl="
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs;
-jar post.jar /tmp/solr_data
Output:

bash-4.1$ java -Dtype=application/json -Drecursive -Durl="
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs;
-jar post.jar /tmp/solr_data SimplePostTool version 1.5 Posting files to
base url
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
using content-type application/json.. Entering recursive mode, max
depth=999, delay=0s Indexing directory /tmp/solr_data (1 files, depth=0)
POSTing file test.json SimplePostTool: WARNING: Solr returned an error #405
(Method Not Allowed) for url:
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
SimplePostTool: WARNING: Response: Apache Tomcat/6.0.45 - Error report

525D76;}--> HTTP Status 405 - HTTP method POST is not supported by this URL
noshade="noshade">

type Status report

message HTTP method POST is not supported by this URL

description The specified HTTP method is not allowed for the requested
resource.

Apache Tomcat/6.0.45
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 405 for URL:
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
1 files indexed. COMMITting Solr index changes to
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs..
Time spent: 0:00:00.100
I also tried:
http://mysolr.net:8983/solr/#/collection_indexer/update/json/docs as the
Durl and got same error message.

Note the end of error message seems to give hint that the error pertains to
the url or REST, can you please clarify what is missing here?

Thank you very much.

**
*Sincerely yours,*


*Raymond*


Re: How to create my schema and add document, thank you

2018-04-06 Thread Raymond Xie
Thanks.

I have moved this to the next stage:

1. 3 fields are to be extracted from the raw files, the location of the raw
file is the fourth field, all the 4 fields become a document
2. Only the 3 fields will be index'ed?
3. The search result should be kind of re-formatted to include the 3 fields
and the content by parsing the location and retrieve the original raw
data's content.

Current challenge is: the raw data is zipped, I am not sure what can I do
to process the zipped file, can Solr handle that?

My plan is below:

AJava (?) based website will be created, a GUI is created for accepting
user's keyword, the keyword will be used to form a POST url, the url will
be used to GET response from Solr, the response will contain the
correspondent message location, the java program will fetch the zipped file
and unzip it, as one zip file could contain multiple messages, the java
program will need to parse out the matched message(s), the parsed matched
message(s) will be shown to the end user together with the other meta data
(the three index'ed fields).

Is this a feasible plan? is there better solution?

Thank you very much.



**
*Sincerely yours,*


*Raymond*

On Thu, Apr 5, 2018 at 7:11 AM, Adhyan Arizki <a.ari...@gmail.com> wrote:

> Raymond,
>
> 1. Please ensure your Solr instance does indeed load up the correct
> managed-schema file. You do not need to create the file, it should have
> been created automatically in the newer version of Solr out of the box. you
> just need to edit it
> 2. Have you reload your instance after you made the modification?
>
> On Thu, Apr 5, 2018 at 6:56 PM, Raymond Xie <xie3208...@gmail.com> wrote:
>
> >  I have the data ready for index now, it is a json file:
> >
> > {"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0",
> > "9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC",
> > "54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48":
> > "179492540", "201": "1", "40": "2", "8": "FIX.4.4", "167": "OPT", "421":
> > "JPN", "10292": "115", "10184": "3379122", "456": "101", "11210":
> > "3379122", "1133": "G", "10515": "178", "10": "200", "11032":
> "-1",
> > "10436": "20180320-08:08:35.038", "10518": "178", "11":
> > "3379122", "75":
> > "20180320", "10005": "178", "10104": "Y", "35": "RIO", "10208":
> > "APAC.VIPER.OOE", "59": "0", "60": "20180320-08:08:35.088", "528": "P",
> > "581": "13", "1": "TEST", "202": "25375.0", "455": "179492540", "55":
> > "JNI253D8.OS", "100": "XOSE", "52": "20180320-08:08:35.088", "10241":
> > "viperooe", "150": "A", "10039": "viperooe", "39": "A", "10438":
> "RIO.4.5",
> > "38": "1", "37": "3379122", "372": "D", "660": "102", "44":
> "2.0",
> > "10066": "20180320-08:08:35.038", "29": "4", "50": "JPNIK01", "22":
> "101"}
> >
> > You can inspect the json here: https://jsonformatter.org/
> >
> > I need to create index and enable searching on tags: 37, 75 and 10242
> > (where available, this sample message doesn't have it)
> >
> > My understanding is I need to create the file managed-schema, I added two
> > fields as below:
> >
> >  > multiValued="true"/>
> >  > stored="false" multiValued="true"/>
> >
> > Then I go back to Solr Admin, I don't see the two new fields in Schema
> > section
> >
> > Anything I am missing here? and once the two fields are put in the
> > managed-schema, can I add the json file through upload in Solr Admin?
> >
> > Thank you very much.
> >
> >
> > **
> > *Sincerely yours,*
> >
> >
> > *Raymond*
> >
>
>
>
> --
>
> Best regards,
> Adhyan Arizki
>


Urgent! How to retrieve the whole message in the Solr search result?

2018-04-06 Thread Raymond Xie
I am using Solr for the following search need:

raw data: in FIX format, it's OK if you don't know what it is, treat it as
csv with a special delimiter.

parsed data: from raw data, all in the same format of a bunch of JSON
format with all 100+ fields.

Example:

Raw data: delimiter is \u001:

8=FIX.4.4 9=653 35=RIO 1=TEST 11=3379122 38=1 44=2.0 39=A 40=2
49=VIPER 50=JPNIK01 54=1 55=JNI253D8.OS 56=XSVC 59=0 75=20180350 100=XOSE
10039=viperooe 10241=viperooe 150=A 372=D 122=20180320-08:08:35.038
10066=20180320-08:08:35.038 10436=20180320-08:08:35.038 202=25375.0
52=20180320-08:08:35.088 60=20180320-08:08:35.088
10071=20180320-08:08:35.088 11210=3379122 37=3379122
10184=3379122 201=1 29=4 10438=RIO.4.5 10005=178 10515=178
10518=178 581=13 660=102 1133=G 528=P 10104=Y 10202=APMKTMAKING
10208=APAC.VIPER.OOE 10217=Y 10292=115 11032=-1 382=0 10537=XOSE 15=JPY
167=OPT 48=179492540 455=179492540 22=101 456=101 151=1.0 421=JPN 10=200

Parsed data: in json:

{"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0",
"9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC",
"54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48":
"179492540", "201": "1", "40": "2", "8": "FIX.4.4", "167": "OPT", "421":
"JPN", "10292": "115", "10184": "3379122", "456": "101", "11210":
"3379122", "1133": "G", "10515": "178", "10": "200", "11032": "-1",
"10436": "20180320-08:08:35.038", "10518": "178", "11":
"3379122", *"75":
"20180320"*, "10005": "178", "10104": "Y", "35": "RIO", "10208":
"APAC.VIPER.OOE", "59": "0", "60": "20180320-08:08:35.088", "528": "P",
"581": "13", "1": "TEST", "202": "25375.0", "455": "179492540", "55":
"JNI253D8.OS", "100": "XOSE", "52": "20180320-08:08:35.088", "10241":
"viperooe", "150": "A", "10039": "viperooe", "39": "A", "10438": "RIO.4.5",
"38": "1", *"37": "3379122"*, "372": "D", "660": "102", "44":
"2.0", "10066": "20180320-08:08:35.038", "29": "4", "50": "JPNIK01", "22":
"101"}

The fields used for searching is order_id (tag 37) and trd_date(tag 75). I
will create the schema with the two fields added to it




At the moment I can get the result by:
http://192.168.112.141:8983/solr/fix_messages/select?q=37:3379122
where 37 is the order_id and  3379122 is the value to search in
field of "37"


The result I get is:

{
  "responseHeader":{
"status":0,
"QTime":6,
"params":{
  "q":"37:3379122"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"122":["20180320-08:08:35.038"],
"49":["VIPER"],
"382":[0],
"151":[1.0],
"9":[653],
"10071":["20180320-08:08:35.088"],
"15":["JPY"],
"56":["XSVC"],
"54":[1],
"10202":["APMKTMAKING"],



I need to show the result like below:

1. the order_id: the term of "order_id" must be displayed instead of its
actual tag 37;
2. the trd_date: the term of "trd_date" must be displayed in the result;
3. the whole message: the whole and raw message must be displayed in the
result;
4. the two fields of order_id and trd_date must be highlighted.

Can anyone tell me how do I do it? Thank you very much in advance.

**
*Sincerely yours,*


*Raymond*


How to create my schema and add document, thank you

2018-04-05 Thread Raymond Xie
 I have the data ready for index now, it is a json file:

{"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0",
"9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC",
"54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48":
"179492540", "201": "1", "40": "2", "8": "FIX.4.4", "167": "OPT", "421":
"JPN", "10292": "115", "10184": "3379122", "456": "101", "11210":
"3379122", "1133": "G", "10515": "178", "10": "200", "11032": "-1",
"10436": "20180320-08:08:35.038", "10518": "178", "11":
"3379122", "75":
"20180320", "10005": "178", "10104": "Y", "35": "RIO", "10208":
"APAC.VIPER.OOE", "59": "0", "60": "20180320-08:08:35.088", "528": "P",
"581": "13", "1": "TEST", "202": "25375.0", "455": "179492540", "55":
"JNI253D8.OS", "100": "XOSE", "52": "20180320-08:08:35.088", "10241":
"viperooe", "150": "A", "10039": "viperooe", "39": "A", "10438": "RIO.4.5",
"38": "1", "37": "3379122", "372": "D", "660": "102", "44": "2.0",
"10066": "20180320-08:08:35.038", "29": "4", "50": "JPNIK01", "22": "101"}

You can inspect the json here: https://jsonformatter.org/

I need to create index and enable searching on tags: 37, 75 and 10242
(where available, this sample message doesn't have it)

My understanding is I need to create the file managed-schema, I added two
fields as below:




Then I go back to Solr Admin, I don't see the two new fields in Schema
section

Anything I am missing here? and once the two fields are put in the
managed-schema, can I add the json file through upload in Solr Admin?

Thank you very much.


**
*Sincerely yours,*


*Raymond*


Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-04 Thread Raymond Xie
I have the data ready for index now, it is a json file:

{"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0",
"9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC",
"54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48":
"179492540", "201": "1", "40": "2", "8": "FIX.4.4", "167": "OPT", "421":
"JPN", "10292": "115", "10184": "3379122", "456": "101", "11210":
"3379122", "1133": "G", "10515": "178", "10": "200", "11032": "-1",
"10436": "20180320-08:08:35.038", "10518": "178", "11":
"3379122", "75":
"20180320", "10005": "178", "10104": "Y", "35": "RIO", "10208":
"APAC.VIPER.OOE", "59": "0", "60": "20180320-08:08:35.088", "528": "P",
"581": "13", "1": "TEST", "202": "25375.0", "455": "179492540", "55":
"JNI253D8.OS", "100": "XOSE", "52": "20180320-08:08:35.088", "10241":
"viperooe", "150": "A", "10039": "viperooe", "39": "A", "10438": "RIO.4.5",
"38": "1", "37": "337912000000002", "372": "D", "660": "102", "44": "2.0",
"10066": "20180320-08:08:35.038", "29": "4", "50": "JPNIK01", "22": "101"}

You can inspect the json here: https://jsonformatter.org/

I need to create index and enable searching on tags: 37, 75 and 10242
(where available, this sample message doesn't have it)

My understanding is I need to create the file managed-schema, I added two
fields as below:




Then I go back to Solr Admin, I don't see the two new fields in Schema
section

Anything I am missing here? and once the two fields are put in the
managed-schema, can I add the json file through upload in Solr Admin?

Thank you very much.






**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:04 AM, Rick Leir <rl...@leirtech.com> wrote:

> Raymond
> There is a default field normally called df. You would normally use
> Copyfield to copy all searchable fields into the default field.
> Cheers -- Rick
>
> On April 1, 2018 11:34:07 PM EDT, Raymond Xie <xie3208...@gmail.com>
> wrote:
> >Hi Rick,
> >
> >I sorted it out half:
> >
> >I should have specified the field in the search query, so, instead of
> >http://localhost:8983/solr/films/browse?q=batman, I should use:
> >http://localhost:8983/solr/films/browse?q=name:batman
> >
> >Sorry for this newbie mistake.
> >
> >But what about if I/user doesn't know or doesn't want to specify the
> >search
> >scope to be restricted in field "name" but anywhere in the index'ed
> >documents?
> >
> >
> >**
> >*Sincerely yours,*
> >
> >
> >*Raymond*
> >
> >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir <rl...@leirtech.com> wrote:
> >
> >> Raymond
> >> The output is not visible to me because the mailing list strips
> >images.
> >> Please try a different way to show the output.
> >> Cheers -- Rick
> >>
> >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie <xie3208...@gmail.com>
> >> wrote:
> >> > I am new to Solr, following Steve Rowe's example on
> >>
> >>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
> >> >
> >> >It would be greatly appreciated if anyone can enlighten me where to
> >> >start
> >> >troubleshooting, thank you very much in advance.
> >> >
> >> >The steps I followed are:
> >> >
> >> >Here ya go << END_OF_SCRIPT
> >> >
> >> >bin/solr stop
> >> >rm server/logs/*.log
> >> >rm -Rf server/solr/films/
> >> >bin/solr start
> >> >bin/solr create -c films
> >> >curl http://localhost:8983/solr/films/schema -X POST -H
> >> >'Content-type:application/json' --data-binary '{
> >> >"add-field" : {
> >> >"name":"name",
> >> >"type":"text_general",
> >> >"multiValued":false,
> >> >"stored":true
> >> >},
> >> >"add-field" : {
> >> >"name":"initial_release_date",
> >> >"type":"pdate",
> >> >"stored":true
> >> >}
> >> >}'
> >> >bin/post -c films example/films/films.json
> >> >curl http://localhost:8983/solr/films/config/params -H
> >> >'Content-type:application/json'  -d '{
> >> >"update" : {
> >> >  "facets": {
> >> >"facet.field":"genre"
> >> >}
> >> >  }
> >> >}'
> >> >
> >> ># END_OF_SCRIPT
> >> >
> >> >Additional fun -
> >> >
> >> >Add highlighting:
> >> >curl http://localhost:8983/solr/films/config/params -H
> >> >'Content-type:application/json'  -d '{
> >> >"set" : {
> >> >  "browse": {
> >> >"hl":"on",
> >> >"hl.fl":"name"
> >> >}
> >> >  }
> >> >}'
> >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
> >> >see "batman" highlighted in the results
> >> >
> >> >
> >> >
> >> >I got nothing in my search:
> >> >
> >> >
> >> >
> >> >
> >> >**
> >> >*Sincerely yours,*
> >> >
> >> >
> >> >*Raymond*
> >>
> >> --
> >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: How do I create a schema file for FIX data in Solr

2018-04-03 Thread Raymond Xie
I'm talking to the author to find out, thanks.

~~~sent from my cell phone, sorry if there is any typo

Adhyan Arizki <a.ari...@gmail.com> 于 2018年4月3日周二 下午1:38写道:

> Raymond,
>
> Seems you are having issue with the node environment. Likely the path isn't
> registered correctly judging from the error message. Note though, this is
> no longer related to Solr issue.
>
> On Tue, 3 Apr 2018, 23:00 Raymond Xie, <xie3208...@gmail.com> wrote:
>
> > Hi Rick,
> >
> > Following your suggestion I found
> https://github.com/SunGard-Labs/fix2json
> > which seems to be a fit;
> >
> > I followed the installation instruction and successfully installed the
> > fix2json on my Ubuntu host.
> >
> > sudo npm install -g fix2json
> >
> > I ran the same command as indicated in the git:
> >
> > fix2json -p dict/FIX50SP2.CME.xml XCME_MD_GE_FUT_20160315.gz
> >
> >
> > and I received error of:
> >
> > /usr/bin/env: ‘node’: No such file or directory
> >
> > It would be appreciated if you can point out what is missing here?
> >
> > Thank you again for your kind help.
> >
> >
> >
> > **
> > *Sincerely yours,*
> >
> >
> > *Raymond*
> >
> > On Mon, Apr 2, 2018 at 9:30 AM, Raymond Xie <xie3208...@gmail.com>
> wrote:
> >
> > > Thank you Rick for the enlightening.
> > >
> > > I will get the FIX message parsed first and come back here later.
> > >
> > >
> > > *------------*
> > > *Sincerely yours,*
> > >
> > >
> > > *Raymond*
> > >
> > > On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:
> > >
> > >> Google
> > >>fix to json,
> > >> there are a few interesting leads.
> > >>
> > >> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xie3208...@gmail.com>
> > >> wrote:
> > >> >Thank you, Shawn, Rick and other readers,
> > >> >
> > >> >To Shawn:
> > >> >
> > >> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> > >> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> > >> >means
> > >> >body length, it is 653 for this message, 35 is RIO, meaning the
> message
> > >> >type is RIO, 122 stands for OrigSendingTime and has a format of
> > >> >UTCTimestamp
> > >> >
> > >> >You can refer to this page for details: https://www.onixs.biz
> > >> >/fix-dictionary/4.2/fields_by_tag.html
> > >> >
> > >> >All the values are explained as string type.
> > >> >
> > >> >All the tag numbers are from FIX standard so it doesn't change (in my
> > >> >case)
> > >> >
> > >> >I expect a python program might be needed to parse the message and
> > >> >extract
> > >> >each tag's value, index is to be made on those extracted value as
> long
> > >> >as
> > >> >their field (tag) name.
> > >> >
> > >> >With index in place, ideally and naturally user will search for any
> > >> >keyword, however, in this case, most queries would be based on tag 37
> > >> >(Order ID) and 75 (Trade Date), there is another customized tag (not
> in
> > >> >the
> > >> >standard) Order Version to be queried on.
> > >> >
> > >> >I understand the parser creation would be a manual process, as long
> as
> > >> >I
> > >> >know or have a small sample program, I will do it myself and maybe
> > >> >adjust
> > >> >it as per need.
> > >> >
> > >> >To Rick:
> > >> >
> > >> >You mentioned creating JSON document, my understanding is a parser
> > >> >would be
> > >> >needed to generate that JSON document, do you have any existing
> example
> > >> >code?
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >Thank you guys very much.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >**
> > &g

Re: How do I create a schema file for FIX data in Solr

2018-04-03 Thread Raymond Xie
Hi Rick,

Following your suggestion I found https://github.com/SunGard-Labs/fix2json
which seems to be a fit;

I followed the installation instruction and successfully installed the
fix2json on my Ubuntu host.

sudo npm install -g fix2json

I ran the same command as indicated in the git:

fix2json -p dict/FIX50SP2.CME.xml XCME_MD_GE_FUT_20160315.gz


and I received error of:

/usr/bin/env: ‘node’: No such file or directory

It would be appreciated if you can point out what is missing here?

Thank you again for your kind help.



**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:30 AM, Raymond Xie <xie3208...@gmail.com> wrote:

> Thank you Rick for the enlightening.
>
> I will get the FIX message parsed first and come back here later.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Google
>>fix to json,
>> there are a few interesting leads.
>>
>> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xie3208...@gmail.com>
>> wrote:
>> >Thank you, Shawn, Rick and other readers,
>> >
>> >To Shawn:
>> >
>> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
>> >means
>> >body length, it is 653 for this message, 35 is RIO, meaning the message
>> >type is RIO, 122 stands for OrigSendingTime and has a format of
>> >UTCTimestamp
>> >
>> >You can refer to this page for details: https://www.onixs.biz
>> >/fix-dictionary/4.2/fields_by_tag.html
>> >
>> >All the values are explained as string type.
>> >
>> >All the tag numbers are from FIX standard so it doesn't change (in my
>> >case)
>> >
>> >I expect a python program might be needed to parse the message and
>> >extract
>> >each tag's value, index is to be made on those extracted value as long
>> >as
>> >their field (tag) name.
>> >
>> >With index in place, ideally and naturally user will search for any
>> >keyword, however, in this case, most queries would be based on tag 37
>> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
>> >the
>> >standard) Order Version to be queried on.
>> >
>> >I understand the parser creation would be a manual process, as long as
>> >I
>> >know or have a small sample program, I will do it myself and maybe
>> >adjust
>> >it as per need.
>> >
>> >To Rick:
>> >
>> >You mentioned creating JSON document, my understanding is a parser
>> >would be
>> >needed to generate that JSON document, do you have any existing example
>> >code?
>> >
>> >
>> >
>> >
>> >Thank you guys very much.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >**
>> >*Sincerely yours,*
>> >
>> >
>> >*Raymond*
>> >
>> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <apa...@elyograg.org>
>> >wrote:
>> >
>> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>> >>
>> >>> FIX is a format standard of financial data. It contains lots of tags
>> >in
>> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
>> >asdf is
>> >>> the tag's value. Each tag has its definition.
>> >>>
>> >>> The sample msg in FIX format was in the original question.
>> >>>
>> >>> All I need to do is to know how to paste the msg and get all tag's
>> >value.
>> >>>
>> >>> I found so far a parser is what I need to start with., But I am more
>> >>> concerning about how to create index in Solr on the extracted tag's
>> >value,
>> >>> that is the first step, the next would be to customize the dashboard
>> >for
>> >>> users to search with a value to find out which msg contains that
>> >value in
>> >>> which tag and present users the whole msg as proof.
>> >>>
>> >>
>> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>> >API
>> >> that implements search functionality.  Solr bolts on some
>> >functionality on
>> >> t

Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-02 Thread Raymond Xie
Thanks Rick and Adhyan

I see there is "/browse" in solrconfig.xml :

 

  explicit

  

and  name="defaults" with one item of "df" as shown below:
  

  _text_

  

My understanding is I can put whatever fields I want to enable index and
searching here in parallel with  _text_, am I correct?

Thanks.




**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 3:24 PM, Adhyan Arizki <a.ari...@gmail.com> wrote:

> Raymond,
>
> You can specify the default behavior in solrconfig.xml under each handler.
> For instance for /browse you can specify it should look into name, and for
> /query you can default it to different field.
>
> On Mon, Apr 2, 2018 at 9:04 PM, Rick Leir <rl...@leirtech.com> wrote:
>
> > Raymond
> > There is a default field normally called df. You would normally use
> > Copyfield to copy all searchable fields into the default field.
> > Cheers -- Rick
> >
> > On April 1, 2018 11:34:07 PM EDT, Raymond Xie <xie3208...@gmail.com>
> > wrote:
> > >Hi Rick,
> > >
> > >I sorted it out half:
> > >
> > >I should have specified the field in the search query, so, instead of
> > >http://localhost:8983/solr/films/browse?q=batman, I should use:
> > >http://localhost:8983/solr/films/browse?q=name:batman
> > >
> > >Sorry for this newbie mistake.
> > >
> > >But what about if I/user doesn't know or doesn't want to specify the
> > >search
> > >scope to be restricted in field "name" but anywhere in the index'ed
> > >documents?
> > >
> > >
> > >**
> > >*Sincerely yours,*
> > >
> > >
> > >*Raymond*
> > >
> > >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir <rl...@leirtech.com> wrote:
> > >
> > >> Raymond
> > >> The output is not visible to me because the mailing list strips
> > >images.
> > >> Please try a different way to show the output.
> > >> Cheers -- Rick
> > >>
> > >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie <xie3208...@gmail.com>
> > >> wrote:
> > >> > I am new to Solr, following Steve Rowe's example on
> > >>
> > >>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
> > >> >
> > >> >It would be greatly appreciated if anyone can enlighten me where to
> > >> >start
> > >> >troubleshooting, thank you very much in advance.
> > >> >
> > >> >The steps I followed are:
> > >> >
> > >> >Here ya go << END_OF_SCRIPT
> > >> >
> > >> >bin/solr stop
> > >> >rm server/logs/*.log
> > >> >rm -Rf server/solr/films/
> > >> >bin/solr start
> > >> >bin/solr create -c films
> > >> >curl http://localhost:8983/solr/films/schema -X POST -H
> > >> >'Content-type:application/json' --data-binary '{
> > >> >"add-field" : {
> > >> >"name":"name",
> > >> >"type":"text_general",
> > >> >"multiValued":false,
> > >> >"stored":true
> > >> >},
> > >> >"add-field" : {
> > >> >"name":"initial_release_date",
> > >> >"type":"pdate",
> > >> >"stored":true
> > >> >}
> > >> >}'
> > >> >bin/post -c films example/films/films.json
> > >> >curl http://localhost:8983/solr/films/config/params -H
> > >> >'Content-type:application/json'  -d '{
> > >> >"update" : {
> > >> >  "facets": {
> > >> >"facet.field":"genre"
> > >> >}
> > >> >  }
> > >> >}'
> > >> >
> > >> ># END_OF_SCRIPT
> > >> >
> > >> >Additional fun -
> > >> >
> > >> >Add highlighting:
> > >> >curl http://localhost:8983/solr/films/config/params -H
> > >> >'Content-type:application/json'  -d '{
> > >> >"set" : {
> > >> >  "browse": {
> > >> >"hl":"on",
> > >> >"hl.fl":"name"
> > >> >}
> > >> >  }
> > >> >}'
> > >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
> > >> >see "batman" highlighted in the results
> > >> >
> > >> >
> > >> >
> > >> >I got nothing in my search:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >**
> > >> >*Sincerely yours,*
> > >> >
> > >> >
> > >> >*Raymond*
> > >>
> > >> --
> > >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> >
> > --
> > Sorry for being brief. Alternate email is rickleir at yahoo dot com
> >
>
>
>
> --
>
> Best regards,
> Adhyan Arizki
>


Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Raymond Xie
Thank you Rick for the enlightening.

I will get the FIX message parsed first and come back here later.


**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:

> Google
>fix to json,
> there are a few interesting leads.
>
> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xie3208...@gmail.com>
> wrote:
> >Thank you, Shawn, Rick and other readers,
> >
> >To Shawn:
> >
> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> >means
> >body length, it is 653 for this message, 35 is RIO, meaning the message
> >type is RIO, 122 stands for OrigSendingTime and has a format of
> >UTCTimestamp
> >
> >You can refer to this page for details: https://www.onixs.biz
> >/fix-dictionary/4.2/fields_by_tag.html
> >
> >All the values are explained as string type.
> >
> >All the tag numbers are from FIX standard so it doesn't change (in my
> >case)
> >
> >I expect a python program might be needed to parse the message and
> >extract
> >each tag's value, index is to be made on those extracted value as long
> >as
> >their field (tag) name.
> >
> >With index in place, ideally and naturally user will search for any
> >keyword, however, in this case, most queries would be based on tag 37
> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
> >the
> >standard) Order Version to be queried on.
> >
> >I understand the parser creation would be a manual process, as long as
> >I
> >know or have a small sample program, I will do it myself and maybe
> >adjust
> >it as per need.
> >
> >To Rick:
> >
> >You mentioned creating JSON document, my understanding is a parser
> >would be
> >needed to generate that JSON document, do you have any existing example
> >code?
> >
> >
> >
> >
> >Thank you guys very much.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >**
> >*Sincerely yours,*
> >
> >
> >*Raymond*
> >
> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <apa...@elyograg.org>
> >wrote:
> >
> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >>
> >>> FIX is a format standard of financial data. It contains lots of tags
> >in
> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
> >asdf is
> >>> the tag's value. Each tag has its definition.
> >>>
> >>> The sample msg in FIX format was in the original question.
> >>>
> >>> All I need to do is to know how to paste the msg and get all tag's
> >value.
> >>>
> >>> I found so far a parser is what I need to start with., But I am more
> >>> concerning about how to create index in Solr on the extracted tag's
> >value,
> >>> that is the first step, the next would be to customize the dashboard
> >for
> >>> users to search with a value to find out which msg contains that
> >value in
> >>> which tag and present users the whole msg as proof.
> >>>
> >>
> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
> >API
> >> that implements search functionality.  Solr bolts on some
> >functionality on
> >> top of Lucene, but doesn't really do anything to fundamentally change
> >the
> >> fact that you're dealing with a Lucene index.  So I'm going to mostly
> >talk
> >> about Lucene below.
> >>
> >> Lucene organizes data in a unit that we call a "document." An easy
> >analogy
> >> for this is that it is a lot like a row in a single database table.
> >It has
> >> fields, each field has a type. Unless custom software is used, there
> >is
> >> really no support for data other than basic primitive types --
> >numbers and
> >> strings.  The only complex type that I can think of that Solr
> >supports out
> >> of the box is geospatial coordinates, and it might even support
> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
> >complex
> >> -- the field just stores and manipulates multiple numbers instead of
> >one.
> >> The Lucene API does support a FEW things that Solr doesn't implement.
> > I
> >> don't think those are applicable to what you're trying to 

Re: How do I create a schema file for FIX data in Solr

2018-04-01 Thread Raymond Xie
Thank you, Shawn, Rick and other readers,

To Shawn:

For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
means BeginString, in this example, its value is  FIX.4.4.9, and 9 means
body length, it is 653 for this message, 35 is RIO, meaning the message
type is RIO, 122 stands for OrigSendingTime and has a format of UTCTimestamp

You can refer to this page for details: https://www.onixs.biz
/fix-dictionary/4.2/fields_by_tag.html

All the values are explained as string type.

All the tag numbers are from FIX standard so it doesn't change (in my case)

I expect a python program might be needed to parse the message and extract
each tag's value, index is to be made on those extracted value as long as
their field (tag) name.

With index in place, ideally and naturally user will search for any
keyword, however, in this case, most queries would be based on tag 37
(Order ID) and 75 (Trade Date), there is another customized tag (not in the
standard) Order Version to be queried on.

I understand the parser creation would be a manual process, as long as I
know or have a small sample program, I will do it myself and maybe adjust
it as per need.

To Rick:

You mentioned creating JSON document, my understanding is a parser would be
needed to generate that JSON document, do you have any existing example
code?




Thank you guys very much.









**
*Sincerely yours,*


*Raymond*

On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>
>> FIX is a format standard of financial data. It contains lots of tags in
>> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
>> the tag's value. Each tag has its definition.
>>
>> The sample msg in FIX format was in the original question.
>>
>> All I need to do is to know how to paste the msg and get all tag's value.
>>
>> I found so far a parser is what I need to start with., But I am more
>> concerning about how to create index in Solr on the extracted tag's value,
>> that is the first step, the next would be to customize the dashboard for
>> users to search with a value to find out which msg contains that value in
>> which tag and present users the whole msg as proof.
>>
>
> Most of Solr's functionality is provided by Lucene.  Lucene is a java API
> that implements search functionality.  Solr bolts on some functionality on
> top of Lucene, but doesn't really do anything to fundamentally change the
> fact that you're dealing with a Lucene index.  So I'm going to mostly talk
> about Lucene below.
>
> Lucene organizes data in a unit that we call a "document." An easy analogy
> for this is that it is a lot like a row in a single database table.  It has
> fields, each field has a type. Unless custom software is used, there is
> really no support for data other than basic primitive types -- numbers and
> strings.  The only complex type that I can think of that Solr supports out
> of the box is geospatial coordinates, and it might even support
> multi-dimensional coordinates, but I'm not sure.  It's not all that complex
> -- the field just stores and manipulates multiple numbers instead of one.
> The Lucene API does support a FEW things that Solr doesn't implement.  I
> don't think those are applicable to what you're trying to do.
>
> Let's look at the first part of the data that you included in the first
> message:
>
> 8=FIX.4.4 9=653 35=RIO
>
> Is "8" always a mixture of letters and numbers and periods? Is "9" always
> a number, and is it always a WHOLE number?  Is "35" always letters?
> Looking deeper to data that I didn't quote ... is "122" always a date/time
> value?  Are the tag numbers always picked from a well-defined set, or do
> they change?
>
> Assuming that the answers in the previous paragraph are found and a
> configuration is created to deal with all of it ... how are you planning to
> search it?  What kind of queries would you expect somebody to make?  That's
> going to have a huge influence on how you configure things.
>
> Writing the schema is usually where people spend the most time when
> they're setting up Solr.
>
> Thanks,
> Shawn
>
>


Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-01 Thread Raymond Xie
Hi Rick,

I sorted it out half:

I should have specified the field in the search query, so, instead of
http://localhost:8983/solr/films/browse?q=batman, I should use:
http://localhost:8983/solr/films/browse?q=name:batman

Sorry for this newbie mistake.

But what about if I/user doesn't know or doesn't want to specify the search
scope to be restricted in field "name" but anywhere in the index'ed
documents?


**
*Sincerely yours,*


*Raymond*

On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir <rl...@leirtech.com> wrote:

> Raymond
> The output is not visible to me because the mailing list strips images.
> Please try a different way to show the output.
> Cheers -- Rick
>
> On March 29, 2018 10:17:13 PM EDT, Raymond Xie <xie3208...@gmail.com>
> wrote:
> > I am new to Solr, following Steve Rowe's example on
> >https://github.com/apache/lucene-solr/tree/master/solr/example/films:
> >
> >It would be greatly appreciated if anyone can enlighten me where to
> >start
> >troubleshooting, thank you very much in advance.
> >
> >The steps I followed are:
> >
> >Here ya go << END_OF_SCRIPT
> >
> >bin/solr stop
> >rm server/logs/*.log
> >rm -Rf server/solr/films/
> >bin/solr start
> >bin/solr create -c films
> >curl http://localhost:8983/solr/films/schema -X POST -H
> >'Content-type:application/json' --data-binary '{
> >"add-field" : {
> >"name":"name",
> >"type":"text_general",
> >"multiValued":false,
> >"stored":true
> >},
> >"add-field" : {
> >"name":"initial_release_date",
> >"type":"pdate",
> >"stored":true
> >}
> >}'
> >bin/post -c films example/films/films.json
> >curl http://localhost:8983/solr/films/config/params -H
> >'Content-type:application/json'  -d '{
> >"update" : {
> >  "facets": {
> >"facet.field":"genre"
> >}
> >  }
> >}'
> >
> ># END_OF_SCRIPT
> >
> >Additional fun -
> >
> >Add highlighting:
> >curl http://localhost:8983/solr/films/config/params -H
> >'Content-type:application/json'  -d '{
> >"set" : {
> >  "browse": {
> >"hl":"on",
> >"hl.fl":"name"
> >}
> >  }
> >}'
> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
> >see "batman" highlighted in the results
> >
> >
> >
> >I got nothing in my search:
> >
> >
> >
> >
> >**
> >*Sincerely yours,*
> >
> >
> >*Raymond*
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: How do I create a schema file for FIX data in Solr

2018-04-01 Thread Raymond Xie
Don't know why the mail list took away the highlighted color on the tags,
anyway, I have explained the data structure so hopefully you get the idea.

Thanks.

~~~sent from my cell phone, sorry if there is any typo

Raymond Xie <xie3208...@gmail.com> 于 2018年4月1日周日 下午12:24写道:

> At the moment I have no plans to stream the data.
>
> Note the raw data is saved in a Linux host, I need to do index on those
> raw data and provide search capabilities on the data.
>
> The data is in FIX, I believe I would need to parse the data and create
> index on the parsed data, I have never worked on FIXdata nor Solr, any
> ideas are greatly appreciated. Thanks lots in advance.
>
> Again, if you want to see the data a sample is in the original question.
>
> ~~~sent from my cell phone, sorry if there is any typo
>
> Raymond Xie <xie3208...@gmail.com> 于 2018年4月1日周日 下午12:12写道:
>
>> Thanks to all.
>>
>> FIX is a format standard of financial data. It contains lots of tags in
>> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
>> the tag's value. Each tag has its definition.
>>
>> The sample msg in FIX format was in the original question.
>>
>> All I need to do is to know how to paste the msg and get all tag's value.
>>
>> I found so far a parser is what I need to start with., But I am more
>> concerning about how to create index in Solr on the extracted tag's value,
>> that is the first step, the next would be to customize the dashboard for
>> users to search with a value to find out which msg contains that value in
>> which tag and present users the whole msg as proof.
>>
>> ~~~sent from my cell phone, sorry if there is any typo
>>
>> Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道:
>>
>>> Raymond
>>> Will you be streaming the FIX data, perhaps with aggregation? Just a
>>> thought, I have no experience with FIX. Streaming opens up lots of
>>> questions.
>>> Cheers -- Rick
>>>
>>> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <
>>> wun...@wunderwood.org> wrote:
>>> >Looks like Financial Information Exchange data, but, as Shawn says, the
>>> >real problem is what you want to do with it.
>>> >
>>> >* What fields will be searched? Those are indexed.
>>> >* What fields will be returned in the result? Those are stored.
>>> >* What is the data type for each field?
>>> >
>>> >I often store the data for most of the fields because it makes
>>> >debugging search problems so much easier.
>>> >
>>> >wunder
>>> >Walter Underwood
>>> >wun...@wunderwood.org
>>> >http://observer.wunderwood.org/  (my blog)
>>> >
>>> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <apa...@elyograg.org>
>>> >wrote:
>>> >>
>>> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>>> >>> I just started using Solr to create a Searching function on our
>>> >existing
>>> >>> data.
>>> >>>
>>> >>> The existing data is in FIX format sample as below:
>>> >> 
>>> >>> all the red tags (I didn't mark all of them) are fields with
>>> >definition
>>> >>> from FIX standard, I need to create index on all the tags, how do I
>>> >start?
>>> >>
>>> >> I do not know what FIX means, and there are no colors in your email.
>>> >>
>>> >> Can you elaborate?
>>> >>
>>> >> Fine-tuning the schema can be one of the most time-consuming parts of
>>> >setting up a Solr installation, and there are usually no easy quick
>>> >answers.  Exactly what to do will depend not only on the data that
>>> >you're indexing, but also what you want to do with it.
>>> >>
>>> >> Thanks,
>>> >> Shawn
>>> >>
>>>
>>> --
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>
>>


Re: How do I create a schema file for FIX data in Solr

2018-04-01 Thread Raymond Xie
At the moment I have no plans to stream the data.

Note the raw data is saved in a Linux host, I need to do index on those raw
data and provide search capabilities on the data.

The data is in FIX, I believe I would need to parse the data and create
index on the parsed data, I have never worked on FIXdata nor Solr, any
ideas are greatly appreciated. Thanks lots in advance.

Again, if you want to see the data a sample is in the original question.

~~~sent from my cell phone, sorry if there is any typo

Raymond Xie <xie3208...@gmail.com> 于 2018年4月1日周日 下午12:12写道:

> Thanks to all.
>
> FIX is a format standard of financial data. It contains lots of tags in
> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
> the tag's value. Each tag has its definition.
>
> The sample msg in FIX format was in the original question.
>
> All I need to do is to know how to paste the msg and get all tag's value.
>
> I found so far a parser is what I need to start with., But I am more
> concerning about how to create index in Solr on the extracted tag's value,
> that is the first step, the next would be to customize the dashboard for
> users to search with a value to find out which msg contains that value in
> which tag and present users the whole msg as proof.
>
> ~~~sent from my cell phone, sorry if there is any typo
>
> Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道:
>
>> Raymond
>> Will you be streaming the FIX data, perhaps with aggregation? Just a
>> thought, I have no experience with FIX. Streaming opens up lots of
>> questions.
>> Cheers -- Rick
>>
>> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>> >Looks like Financial Information Exchange data, but, as Shawn says, the
>> >real problem is what you want to do with it.
>> >
>> >* What fields will be searched? Those are indexed.
>> >* What fields will be returned in the result? Those are stored.
>> >* What is the data type for each field?
>> >
>> >I often store the data for most of the fields because it makes
>> >debugging search problems so much easier.
>> >
>> >wunder
>> >Walter Underwood
>> >wun...@wunderwood.org
>> >http://observer.wunderwood.org/  (my blog)
>> >
>> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <apa...@elyograg.org>
>> >wrote:
>> >>
>> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>> >>> I just started using Solr to create a Searching function on our
>> >existing
>> >>> data.
>> >>>
>> >>> The existing data is in FIX format sample as below:
>> >> 
>> >>> all the red tags (I didn't mark all of them) are fields with
>> >definition
>> >>> from FIX standard, I need to create index on all the tags, how do I
>> >start?
>> >>
>> >> I do not know what FIX means, and there are no colors in your email.
>> >>
>> >> Can you elaborate?
>> >>
>> >> Fine-tuning the schema can be one of the most time-consuming parts of
>> >setting up a Solr installation, and there are usually no easy quick
>> >answers.  Exactly what to do will depend not only on the data that
>> >you're indexing, but also what you want to do with it.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>


Re: How do I create a schema file for FIX data in Solr

2018-04-01 Thread Raymond Xie
Thanks to all.

FIX is a format standard of financial data. It contains lots of tags in
number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
the tag's value. Each tag has its definition.

The sample msg in FIX format was in the original question.

All I need to do is to know how to paste the msg and get all tag's value.

I found so far a parser is what I need to start with., But I am more
concerning about how to create index in Solr on the extracted tag's value,
that is the first step, the next would be to customize the dashboard for
users to search with a value to find out which msg contains that value in
which tag and present users the whole msg as proof.

~~~sent from my cell phone, sorry if there is any typo

Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道:

> Raymond
> Will you be streaming the FIX data, perhaps with aggregation? Just a
> thought, I have no experience with FIX. Streaming opens up lots of
> questions.
> Cheers -- Rick
>
> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >Looks like Financial Information Exchange data, but, as Shawn says, the
> >real problem is what you want to do with it.
> >
> >* What fields will be searched? Those are indexed.
> >* What fields will be returned in the result? Those are stored.
> >* What is the data type for each field?
> >
> >I often store the data for most of the fields because it makes
> >debugging search problems so much easier.
> >
> >wunder
> >Walter Underwood
> >wun...@wunderwood.org
> >http://observer.wunderwood.org/  (my blog)
> >
> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <apa...@elyograg.org>
> >wrote:
> >>
> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
> >>> I just started using Solr to create a Searching function on our
> >existing
> >>> data.
> >>>
> >>> The existing data is in FIX format sample as below:
> >> 
> >>> all the red tags (I didn't mark all of them) are fields with
> >definition
> >>> from FIX standard, I need to create index on all the tags, how do I
> >start?
> >>
> >> I do not know what FIX means, and there are no colors in your email.
> >>
> >> Can you elaborate?
> >>
> >> Fine-tuning the schema can be one of the most time-consuming parts of
> >setting up a Solr installation, and there are usually no easy quick
> >answers.  Exactly what to do will depend not only on the data that
> >you're indexing, but also what you want to do with it.
> >>
> >> Thanks,
> >> Shawn
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


How do I create a schema file for FIX data in Solr

2018-03-31 Thread Raymond Xie
Hello,

I just started using Solr to create a Searching function on our existing
data.

The existing data is in FIX format sample as below:

8=FIX.4.4 9=653 35=RIO 1=TEST 11=3379122 38=1 44=2.0 39=A 40=2
49=VIPER 50=JPNIK01 54=1 55=JNI253D8.OS 56=XSVC 59=0 75=20180350
100=XOSE 10039=viperooe 10241=viperooe 150=A 372=D
122=20180320-08:08:35.038 10066=20180320-08:08:35.038
10436=20180320-08:08:35.038 202=25375.0 52=20180320-08:08:35.088
60=20180320-08:08:35.088 10071=20180320-08:08:35.088
11210=3379122 37=3379122 10184=3379122 201=1
29=4 10438=RIO.4.5 10005=178 10515=178 10518=178 581=13 660=102 1133=G
528=P 10104=Y 10202=APMKTMAKING 10208=APAC.VIPER.OOE 10217=Y 10292=115
11032=-1 382=0 10537=XOSE 15=JPY 167=OPT 48=179492540 455=179492540
22=101 456=101 151=1.0 421=JPN 10=200


all the red tags (I didn't mark all of them) are fields with definition
from FIX standard, I need to create index on all the tags, how do I start?

Thank you very much.

**
*Sincerely yours,*


*Raymond*

On Sat, Mar 31, 2018 at 12:24 AM, Randy Fradin 
wrote:

> I have a SolrCloud cluster (version 6.5.1) with around 3300 cores per
> instance. I've been investigating what is driving heap utilization since it
> is higher than I expected. I took a heap dump and found the largest driver
> of heap utilization is the array of VersionBucket objects in the
> org.apache.solr.update.VersionInfo class. The array is size 65536 and
> there
> is one per SolrCore instance. Each instance of the array is 1.8MB so the
> aggregate size is 6GB in heap.
>
> I understand from reading the discussion in SOLR-6820 that 65536 is the
> recommended default for this setting now because it results in higher
> document write rates than the old default of 256. I would like to reduce my
> heap utilization and I'm OK with somewhat slower document writing
> throughput. My question is, it is safe to reduce the value
> of numVersionBuckets on all of my existing cores without reindexing my
> data?
>
> My solrconfig.xml contains this for all of my collections:
>
> 
>   
> ${solr.ulog.dir:}
> ${solr.ulog.numVersionBuckets:
> 65536}
>   
> 
>
> Assuming it is safe to change, can I just add a vm arg to the Solr process
> like "-Dsolr.ulog.numVersionBuckets=256" to override the value for all
> cores at once? Or do I have to change and re-upload the solrconfig.xml
> files and reload the cores?
>
> Thanks
>


Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-03-29 Thread Raymond Xie
 I am new to Solr, following Steve Rowe's example on
https://github.com/apache/lucene-solr/tree/master/solr/example/films:

It would be greatly appreciated if anyone can enlighten me where to start
troubleshooting, thank you very much in advance.

The steps I followed are:

Here ya go << END_OF_SCRIPT

bin/solr stop
rm server/logs/*.log
rm -Rf server/solr/films/
bin/solr start
bin/solr create -c films
curl http://localhost:8983/solr/films/schema -X POST -H
'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"name",
"type":"text_general",
"multiValued":false,
"stored":true
},
"add-field" : {
"name":"initial_release_date",
"type":"pdate",
"stored":true
}
}'
bin/post -c films example/films/films.json
curl http://localhost:8983/solr/films/config/params -H
'Content-type:application/json'  -d '{
"update" : {
  "facets": {
"facet.field":"genre"
}
  }
}'

# END_OF_SCRIPT

Additional fun -

Add highlighting:
curl http://localhost:8983/solr/films/config/params -H
'Content-type:application/json'  -d '{
"set" : {
  "browse": {
"hl":"on",
"hl.fl":"name"
}
  }
}'
try http://localhost:8983/solr/films/browse?q=batman now, and you'll
see "batman" highlighted in the results



I got nothing in my search:




**
*Sincerely yours,*


*Raymond*


New Subscribe

2018-03-27 Thread Raymond Xie
**
*Sincerely yours,*


*Raymond*