pardon: [solr-home]/server/log/solr.log

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar <sarkaramr...@gmail.com>
wrote:

> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
> the machine. I haven't played much with docker, any way you can get that
> file from that location.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <la...@franz.com> wrote:
>
>> Amrit Sarkar wrote:
>>
>> >> Hi Kevin,
>> >>
>> >> Can you post the solr log in the mail thread. I don't think it handled
>> the
>> >> .md by itself by first glance at code.
>>
>> How do I extract the log you want?
>>
>>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <la...@franz.com> wrote:
>> >>
>> >> > Amrit Sarkar wrote:
>> >> >
>> >> > >> Kevin,
>> >> > >>
>> >> > >> Just put "html" too and give it a shot. These are the types it is
>> >> > expecting:
>> >> >
>> >> > Same thing.
>> >> >
>> >> > >>
>> >> > >> mimeMap = new HashMap<>();
>> >> > >> mimeMap.put("xml", "application/xml");
>> >> > >> mimeMap.put("csv", "text/csv");
>> >> > >> mimeMap.put("json", "application/json");
>> >> > >> mimeMap.put("jsonl", "application/json");
>> >> > >> mimeMap.put("pdf", "application/pdf");
>> >> > >> mimeMap.put("rtf", "text/rtf");
>> >> > >> mimeMap.put("html", "text/html");
>> >> > >> mimeMap.put("htm", "text/html");
>> >> > >> mimeMap.put("doc", "application/msword");
>> >> > >> mimeMap.put("docx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > wordprocessingml.document");
>> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> >> > >> mimeMap.put("pptx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > presentationml.presentation");
>> >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> >> > >> mimeMap.put("xlsx",
>> >> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml
>> .sheet");
>> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("odp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("otp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("ods", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("ots", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("txt", "text/plain");
>> >> > >> mimeMap.put("log", "text/plain");
>> >> > >>
>> >> > >> The keys are the types supported.
>> >> > >>
>> >> > >>
>> >> > >> Amrit Sarkar
>> >> > >> Search Engineer
>> >> > >> Lucidworks, Inc.
>> >> > >> 415-589-9269
>> >> > >> www.lucidworks.com
>> >> > >> Twitter http://twitter.com/lucidworks
>> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >>
>> >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
>> sarkaramr...@gmail.com>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > Ah!
>> >> > >> >
>> >> > >> > Only supported type is: text/html; encoding=utf-8
>> >> > >> >
>> >> > >> > I am not confident of this either :) but this should work.
>> >> > >> >
>> >> > >> > See the code-snippet below:
>> >> > >> >
>> >> > >> > ......
>> >> > >> >
>> >> > >> > if(res.httpStatus == 200) {
>> >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> >> > >> >   String rawContentType = conn.getContentType();
>> >> > >> >   String type = rawContentType.split(";")[0];
>> >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> >> > >> >     String encoding = conn.getContentEncoding();
>> >> > >> >
>> >> > >> > ....
>> >> > >> >
>> >> > >> >
>> >> > >> > Amrit Sarkar
>> >> > >> > Search Engineer
>> >> > >> > Lucidworks, Inc.
>> >> > >> > 415-589-9269
>> >> > >> > www.lucidworks.com
>> >> > >> > Twitter http://twitter.com/lucidworks
>> >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >> >
>> >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <la...@franz.com>
>> wrote:
>> >> > >> >
>> >> > >> >> Amrit Sarkar wrote:
>> >> > >> >>
>> >> > >> >> >> Strange,
>> >> > >> >> >>
>> >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
>> wiki.apache.org
>> >> > page's
>> >> > >> >> >> Content-Type. Let's see what it says now.
>> >> > >> >>
>> >> > >> >> Same thing.  Verified Content-Type:
>> >> > >> >>
>> >> > >> >> quadra[git:master]$ wget -S -O /dev/null
>> http://quadra:9091/index.md
>> >> > |&
>> >> > >> >> grep Content-Type
>> >> > >> >>   Content-Type: text/html;charset=utf-8
>> >> > >> >> quadra[git:master]$ ]
>> >> > >> >>
>> >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post
>> -c
>> >> > handbook
>> >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> md
>> >> > >> >> /docker-java-home/jre/bin/java -classpath
>> >> > /opt/solr/dist/solr-core-7.0.1.jar
>> >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> >> > -Ddata=web
>> >> > >> >> org.apache.solr.util.SimplePostTool
>> http://quadra:9091/index.md
>> >> > >> >> SimplePostTool version 5.0.0
>> >> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> >> > >> >> dbook/update/extract
>> >> > >> >> Entering auto mode. Indexing pages with content-types
>> corresponding
>> >> > to
>> >> > >> >> file endings md
>> >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
>> faster than
>> >> > >> >> every 10 seconds, your IP will probably be blocked
>> >> > >> >> Entering recursive mode, depth=10, delay=0s
>> >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
>> text/html
>> >> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> >> > returned a
>> >> > >> >> HTTP result status of 415
>> >> > >> >> 0 web pages indexed.
>> >> > >> >> COMMITting Solr index changes to
>> http://localhost:8983/solr/han
>> >> > >> >> dbook/update/extract...
>> >> > >> >> Time spent: 0:00:00.531
>> >> > >> >> quadra[git:master]$
>> >> > >> >>
>> >> > >> >> Kevin
>> >> > >> >>
>> >> > >> >> >>
>> >> > >> >> >> Amrit Sarkar
>> >> > >> >> >> Search Engineer
>> >> > >> >> >> Lucidworks, Inc.
>> >> > >> >> >> 415-589-9269
>> >> > >> >> >> www.lucidworks.com
>> >> > >> >> >> Twitter http://twitter.com/lucidworks
>> >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >> >> >>
>> >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
>> la...@franz.com>
>> >> > wrote:
>> >> > >> >> >>
>> >> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html,
>> but now
>> >> > I get
>> >> > >> >> >> >
>> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported
>> type
>> >> > text/html
>> >> > >> >> >> >
>> >> > >> >> >> > What is it expecting?
>> >> > >> >> >> >
>> >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
>> -filetypes
>> >> > md
>> >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
>> -Dc=handbook
>> >> > >> >> -Ddata=web
>> >> > >> >> >> > org.apache.solr.util.SimplePostTool
>> http://quadra:9091/index.md
>> >> > >> >> >> > SimplePostTool version 5.0.0
>> >> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> >> > >> >> >> > handbook/update/extract
>> >> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> >> > corresponding
>> >> > >> >> to
>> >> > >> >> >> > file endings md
>> >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
>> >> > faster than
>> >> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported
>> type
>> >> > text/html
>> >> > >> >> >> > SimplePostTool: WARNING: The URL
>> http://quadra:9091/index.md
>> >> > >> >> returned a
>> >> > >> >> >> > HTTP result status of 415
>> >> > >> >> >> > 0 web pages indexed.
>> >> > >> >> >> > COMMITting Solr index changes to
>> http://localhost:8983/solr/
>> >> > >> >> >> > handbook/update/extract...
>> >> > >> >> >> > Time spent: 0:00:03.882
>> >> > >> >> >> > $
>> >> > >> >> >> >
>> >> > >> >> >> > Thanks.
>> >> > >> >> >> >
>> >> > >> >> >> > Kevin
>> >> > >> >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> >
>>
>
>

Reply via email to