[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397441#comment-16397441
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372772738
 
 
   Thanks. Got it.
   
   On Tue, Mar 13, 2018 at 1:05 PM, Lewis John McGibbney <
   notificati...@github.com> wrote:
   
   > @Yongyao  in id.apache.org you will see a
   > reset password option. Use it.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Assignee: Yongyao Jiang
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397272#comment-16397272
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372742125
 
 
   @Yongyao in id.apache.org you will see a reset password option. Use it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397222#comment-16397222
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372731812
 
 
   @lewismc It turns out I don't have an apahe id (https://id.apache.org/). I 
have access to https://issues.apache.org, but it looks like they are not the 
same thing. 
   As for the roaster email, the website ( 
https://whimsy.apache.org/roster/ppmc/sdap) is asking for my pw which I don't 
have either.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397197#comment-16397197
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372727170
 
 
   > I don't have permission to merge the PR. I saw you sent out some apache 
roster a while ago. Is this related to that?
   
   See https://gitbox.apache.org/setup/


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397183#comment-16397183
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372721902
 
 
   @lewismc Format changed. I don't have permission to merge the PR. I saw you 
sent out some apache roster a while ago. Is this related to that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397060#comment-16397060
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372693740
 
 
   @lewismc Please check this new PR out. All change requests have been made 
except for the upper case one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397056#comment-16397056
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao opened a new pull request #8: SDAP-35 Overhaul MUDROD configuration
URL: https://github.com/apache/incubator-sdap-mudrod/pull/8
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397057#comment-16397057
 ] 

ASF GitHub Bot commented on SDAP-35:


asfgit commented on issue #8: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/8#issuecomment-372693186
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396959#comment-16396959
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on issue #7: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#issuecomment-372672299
 
 
   @lewismc Yes, all make sense expect the one I just replied. I will submit a 
new PR today or tomorrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396957#comment-16396957
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r174137301
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/structure/ApacheAccessLog.java
 ##
 @@ -15,55 +15,49 @@
 
 import com.google.gson.Gson;
 
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;
+import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
-import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
-
 /**
  * This class represents an Apache access log line. See
  * http://httpd.apache.org/docs/2.2/logs.html for more details.
  */
 public class ApacheAccessLog extends WebLog implements Serializable {
-
-
-  /**
-   * 
-   */
-  private static final long serialVersionUID = 1L;
-
-  public ApacheAccessLog() {
-//default constructor
-  }
-
-  String response;
-  String referer;
-  String browser;
+  String Response;
 
 Review comment:
   These variables were all upper case at the beginning. The previous 
enhancement PR changed it to lower case which can result in a null-pointer 
error, as they correspond to the field names in Elasticsearch which he didn't 
change. Therefore, I changed it back for the time being as these filed names 
are used explicitly at many places.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396497#comment-16396497
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on issue #7: SDAP-35 Overhaul MUDROD configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#issuecomment-372537001
 
 
   PING @Yongyao do the above changes make sense? If you can provide an update 
to this WAM then I can merge into my local working branch before submitting the 
storage re-architecture PR. Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393255#comment-16393255
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173517887
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/structure/ApacheAccessLog.java
 ##
 @@ -15,55 +15,49 @@
 
 import com.google.gson.Gson;
 
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;
+import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
-import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
-
 /**
  * This class represents an Apache access log line. See
  * http://httpd.apache.org/docs/2.2/logs.html for more details.
  */
 public class ApacheAccessLog extends WebLog implements Serializable {
-
-
-  /**
-   * 
-   */
-  private static final long serialVersionUID = 1L;
-
-  public ApacheAccessLog() {
-//default constructor
-  }
-
-  String response;
-  String referer;
-  String browser;
+  String Response;
 
 Review comment:
   No it will not, you are essentially REVERTING the correct Java code 
convention here. The changes are restructured to this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393251#comment-16393251
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173517658
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/pre/ApiHarvester.java
 ##
 @@ -57,7 +57,8 @@ public Object execute() {
 //remove old metadata from ES
 es.deleteType(props.getProperty(MudrodConstants.ES_INDEX_NAME), 
props.getProperty(MudrodConstants.RAW_METADATA_TYPE));
 //harvest new metadata using PO.DAAC web services
-harvestMetadatafromWeb();
+if(props.getProperty(MudrodConstants.METADATA_DOWNLOAD).equals("1")) 
 
 Review comment:
   
https://stackoverflow.com/questions/24656018/string-literal-expressions-should-be-on-the-left-side-of-an-equals-comparison#24658779


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393246#comment-16393246
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173517076
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/integration/LinkageIntegration.java
 ##
 @@ -173,32 +171,32 @@ public JsonObject getIngeratedListInJson(String input) {
* the similarities from different sources
*/
   public Map 
aggregateRelatedTermsFromAllmodel(String input) {
-aggregateRelatedTerms(input, props.getProperty("userHistoryLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("clickStreamLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("metadataLinkageType"));
-aggregateRelatedTermsSWEET(input, 
props.getProperty("ontologyLinkageType"));
+aggregateRelatedTerms(input, MudrodConstants.USE_HISTORY_LINKAGE_TYPE);
 
 Review comment:
   ACK


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393128#comment-16393128
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173500638
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/structure/ApacheAccessLog.java
 ##
 @@ -15,55 +15,49 @@
 
 import com.google.gson.Gson;
 
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;
+import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
-import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
-
 /**
  * This class represents an Apache access log line. See
  * http://httpd.apache.org/docs/2.2/logs.html for more details.
  */
 public class ApacheAccessLog extends WebLog implements Serializable {
-
-
-  /**
-   * 
-   */
-  private static final long serialVersionUID = 1L;
-
-  public ApacheAccessLog() {
-//default constructor
-  }
-
-  String response;
-  String referer;
-  String browser;
+  String Response;
 
 Review comment:
   I suggest creating a new issue about lower/upper case. If I just change them 
to lower case here, lots of other code will be affected


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393127#comment-16393127
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173500190
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/pre/ApiHarvester.java
 ##
 @@ -57,7 +57,8 @@ public Object execute() {
 //remove old metadata from ES
 es.deleteType(props.getProperty(MudrodConstants.ES_INDEX_NAME), 
props.getProperty(MudrodConstants.RAW_METADATA_TYPE));
 //harvest new metadata using PO.DAAC web services
-harvestMetadatafromWeb();
+if(props.getProperty(MudrodConstants.METADATA_DOWNLOAD).equals("1")) 
 
 Review comment:
   Could you explain the difference? The results looks the same to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393126#comment-16393126
 ] 

ASF GitHub Bot commented on SDAP-35:


Yongyao commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173499832
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/integration/LinkageIntegration.java
 ##
 @@ -173,32 +171,32 @@ public JsonObject getIngeratedListInJson(String input) {
* the similarities from different sources
*/
   public Map 
aggregateRelatedTermsFromAllmodel(String input) {
-aggregateRelatedTerms(input, props.getProperty("userHistoryLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("clickStreamLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("metadataLinkageType"));
-aggregateRelatedTermsSWEET(input, 
props.getProperty("ontologyLinkageType"));
+aggregateRelatedTerms(input, MudrodConstants.USE_HISTORY_LINKAGE_TYPE);
 
 Review comment:
   This variable is not in the conf.properties file. It is just a constant.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393096#comment-16393096
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173493250
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/MetadataExtractor.java
 ##
 @@ -67,31 +67,15 @@ public MetadataExtractor() {
* @param type  metadata type name
* @return metadata list
*/
-  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
+  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
 
-List metadatas = new ArrayList();
+List metadatas = new ArrayList();
 
 Review comment:
   There is no need to add type safety to the right hand side of the 
assignment/ Please just use the diamond operator ```<>``` e.g. remove 
```Metadata```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393094#comment-16393094
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173491332
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/driver/ESDriver.java
 ##
 @@ -561,6 +567,9 @@ public int getDocCount(String[] index, String[] type) {
 return this.getDocCount(index, type, search);
   }
 
+  /*
+   * Get the number of docs in a type of a index
 
 Review comment:
   Please add parameters to Javadoc


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393104#comment-16393104
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496602
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/ImportLogFile.java
 ##
 @@ -306,9 +306,9 @@ public void parseSingleLineHTTP(String log, String index, 
String type) {
 CrawlerDetection crawlerDe = new CrawlerDetection(this.props, this.es, 
this.spark);
 if (!crawlerDe.checkKnownCrawler(agent)) {
   boolean tag = false;
-  String[] mimeTypes = { ".js", ".css", ".jpg", ".png", ".ico", 
"image_captcha", "autocomplete", ".gif", "/alldata/", "/api/", "get / 
http/1.1", ".jpeg", "/ws/" };
-  for (String mimeType : mimeTypes) {
-if (request.contains(mimeType)) {
+  String[] mimeTypes = 
props.getProperty(MudrodConstants.BLACK_LIST_REQUEST).split(",");
+  for (int i = 0; i < mimeTypes.length; i++) {
 
 Review comment:
   If possible use enhanced for loop


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393113#comment-16393113
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173497391
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/structure/ApacheAccessLog.java
 ##
 @@ -15,55 +15,49 @@
 
 import com.google.gson.Gson;
 
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;
+import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
-import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
-
 /**
  * This class represents an Apache access log line. See
  * http://httpd.apache.org/docs/2.2/logs.html for more details.
  */
 public class ApacheAccessLog extends WebLog implements Serializable {
-
-
-  /**
-   * 
-   */
-  private static final long serialVersionUID = 1L;
-
-  public ApacheAccessLog() {
-//default constructor
-  }
-
-  String response;
-  String referer;
-  String browser;
+  String Response;
 
 Review comment:
   Unless these are private then they should be below the default constructor!
   Revert the global variable and constructor positioning


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393103#comment-16393103
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496822
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/SessionStatistic.java
 ##
 @@ -221,17 +221,17 @@ public int processSession(ESDriver es, String sessionId) 
throws IOException, Int
   request = matcher.group(1);
 }
 
-String datasetlist = "/datasetlist?";
-String dataset = "/dataset/";
+String datasetlist = props.getProperty(MudrodConstants.SEARCH_MARKER);
+String dataset = props.getProperty(MudrodConstants.VIEW_MARKER);
 if (request.contains(datasetlist)) {
   searchDataListRequestCount++;
 
   RequestUrl requestURL = new RequestUrl();
   String infoStr = requestURL.getSearchInfo(request) + ",";
-  String info = es.customAnalyzing(props.getProperty("indexName"), 
infoStr);
+  String info = 
es.customAnalyzing(props.getProperty(MudrodConstants.ES_INDEX_NAME), infoStr);
 
-  if (!",".equals(info)) {
-if ("".equals(keywords)) {
+  if (!info.equals(",")) {
 
 Review comment:
   Move string value to right hand side of statement


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393105#comment-16393105
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173498332
 
 

 ##
 File path: core/src/main/resources/config.properties
 ##
 @@ -0,0 +1,77 @@
+# Licensed under the Apache License, Version 2.0 (the "License"); 
+# you may not use this file except in compliance with the License. 
+# You may obtain  a copy of the License at 
+#  
+# http://www.apache.org/licenses/LICENSE-2.0 Unless 
+#  
+# required by applicable law or agreed to in writing, software 
+# distributed under the License is distributed on an "AS IS" 
+# BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
+# express or implied. See the License for the specific language 
+# governing permissions and limitations under the License. 
+# Define some default values that can be overridden by system properties
+# Logging Threshold
 
 Review comment:
   Remove this useless statement


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393092#comment-16393092
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173492972
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/pre/ApiHarvester.java
 ##
 @@ -57,7 +57,8 @@ public Object execute() {
 //remove old metadata from ES
 es.deleteType(props.getProperty(MudrodConstants.ES_INDEX_NAME), 
props.getProperty(MudrodConstants.RAW_METADATA_TYPE));
 //harvest new metadata using PO.DAAC web services
-harvestMetadatafromWeb();
+if(props.getProperty(MudrodConstants.METADATA_DOWNLOAD).equals("1")) 
 
 Review comment:
   I have a feeling that the string should be on the left hand side of the 
logic as per 
   ```
   if("1".equals(props.getProperty(MudrodConstants.METADATA_DOWNLOAD)))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393098#comment-16393098
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173494328
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/process/FeatureBasedSimilarity.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.process;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.index.IndexRequest;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.DecimalFormat;
+import java.util.*;
 
 Review comment:
   Never use the wildcard imports. Always try to make individual imports or 
else we use more heap than we need to.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393118#comment-16393118
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496195
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/ssearch/Ranker.java
 ##
 @@ -35,14 +35,12 @@
*/
   private static final long serialVersionUID = 1L;
   transient List resultList = new ArrayList<>();
-
-  String learnerType = null;
   Learner le = null;
 
-  public Ranker(Properties props, ESDriver es, SparkDriver spark, String 
learnerType) {
+  public Ranker(Properties props, ESDriver es, SparkDriver spark) {
 super(props, es, spark);
-this.learnerType = learnerType;
-le = new Learner(learnerType, spark, 
props.getProperty(MudrodConstants.SVM_SGD_MODEL));
+if(props.getProperty(MudrodConstants.RANKING_ML).equals("1"))
 
 Review comment:
   The string value should be on the left hand side of the equals statement 
e.g. ```
   if("1".equals(props.getProperty(MudrodConstants.RANKING_ML)))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393109#comment-16393109
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496774
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/SessionStatistic.java
 ##
 @@ -82,9 +82,9 @@ public Object execute() {
 
   public void processSession() throws InterruptedException, IOException, 
ExecutionException {
 String processingType = props.getProperty(MudrodConstants.PROCESS_TYPE);
-if ("sequential".equals(processingType)) {
+if (processingType.equals("sequential")) {
   processSessionInSequential();
-} else if ("parallel".equals(processingType)) {
+} else if (processingType.equals("parallel")) {
 
 Review comment:
   Move string value to right hand side of statement


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393114#comment-16393114
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495801
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataTokenizer.java
 ##
 @@ -109,10 +106,12 @@ public MetadataOpt(Properties props) {
 
   for (SearchHit hit : scrollResp.getHits().getHits()) {
 Map result = hit.getSource();
-String shortName = (String) result.get("Dataset-ShortName");
+String shortName = (String) result.get(metadataName);
 
 String filedStr = "";
-for (String filed : variables) {
+int size = variables.size();
 
 Review comment:
   Use enhanced for loop if possible


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393100#comment-16393100
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173494466
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/process/FeatureBasedSimilarity.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.process;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.index.IndexRequest;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.DecimalFormat;
+import java.util.*;
+
+import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
+
+public class FeatureBasedSimilarity extends DiscoveryStepAbstract implements 
Serializable {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(FeatureBasedSimilarity.class);
+
+  private DecimalFormat df = new DecimalFormat("#.000");
+  // a map from variable to its type
+  MetadataFeature metadata = null;
+  public Map variableTypes;
+  public Map variableWeights;
+
+
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+  private String variableSimType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public FeatureBasedSimilarity(Properties props, ESDriver es, SparkDriver 
spark) {
+super(props, es, spark);
+
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+variableSimType = MudrodConstants.METADATA_FEATURE_SIM_TYPE;
+ 
+// !!! important, please change to other class when using other metadata
+metadata = new PODAACMetadataFeature();
+metadata.inital();
+variableTypes = metadata.featureTypes;
+variableWeights = metadata.featureWeights;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*calculating metadata feature based similarity 
starts**");
 
 Review comment:
   Remove `````` from logging it is untidy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393117#comment-16393117
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495526
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataFeature.java
 ##
 @@ -0,0 +1,71 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.utils.LabeledRowMatrix;
+import org.apache.sdap.mudrod.utils.MatrixUtil;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import scala.Tuple2;
+import scala.tools.nsc.transform.SpecializeTypes.Abstract;
+
+import java.io.Serializable;
+import java.util.*;
 
 Review comment:
   Never use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393101#comment-16393101
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495186
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/process/FeatureBasedSimilarity.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.process;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.index.IndexRequest;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.DecimalFormat;
+import java.util.*;
+
+import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
+
+public class FeatureBasedSimilarity extends DiscoveryStepAbstract implements 
Serializable {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(FeatureBasedSimilarity.class);
+
+  private DecimalFormat df = new DecimalFormat("#.000");
+  // a map from variable to its type
+  MetadataFeature metadata = null;
+  public Map variableTypes;
+  public Map variableWeights;
+
+
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+  private String variableSimType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public FeatureBasedSimilarity(Properties props, ESDriver es, SparkDriver 
spark) {
+super(props, es, spark);
+
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+variableSimType = MudrodConstants.METADATA_FEATURE_SIM_TYPE;
+ 
+// !!! important, please change to other class when using other metadata
+metadata = new PODAACMetadataFeature();
+metadata.inital();
+variableTypes = metadata.featureTypes;
+variableWeights = metadata.featureWeights;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*calculating metadata feature based similarity 
starts**");
+startTime = System.currentTimeMillis();
+es.deleteType(indexName, variableSimType);
+addMapping(es, indexName, variableSimType);
+
+featureSimilarity(es);
+es.refreshIndex();
+normalizeVariableWeight(es);
+es.refreshIndex();
+endTime = System.currentTimeMillis();
+LOG.info("*calculating metadata feature based similarity 
ends**Took {}s", (endTime - startTime) / 1000);
+return null;
+  }
+
+  @Override
+  public Object execute(Object o) {
+return null;
+  }
+
+  public void featureSimilarity(ESDriver es) {
+
+es.createBulkProcessor();
+
+List> metadatas = new ArrayList<>();
+SearchResponse scrollResp = 
es.getClient().prepareSearch(indexName).setTypes(metadataType).setScroll(new 
TimeValue(6)).setQuery(QueryBuilders.matchAllQuery()).setSize(100).execute()
+.actionGet();
+while (true) {
+  for (SearchHit hit : scrollResp.getHits().getHits()) {
+Map metadataA = hit.getSource();
+metadatas.add(metadataA);
+  }
+
+  scrollResp = 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393102#comment-16393102
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495881
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/PODAACMetadataFeature.java
 ##
 @@ -0,0 +1,360 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.utils.LabeledRowMatrix;
+import org.apache.sdap.mudrod.utils.MatrixUtil;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import scala.Tuple2;
+import scala.tools.nsc.transform.SpecializeTypes.Abstract;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.*;
 
 Review comment:
   Never use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393119#comment-16393119
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496894
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/SessionStatistic.java
 ##
 @@ -249,19 +249,13 @@ public int processSession(ESDriver es, String sessionId) 
throws IOException, Int
   searchDataRequestCount++;
   if (findDataset(request) != null) {
 String view = findDataset(request);
-
-if ("".equals(views)) {
+if (views.equals("")) 
 
 Review comment:
   Move string value to right hand side of statement


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393093#comment-16393093
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173492545
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodEngine.java
 ##
 @@ -138,48 +138,37 @@ private InputStream locateConfig() {
   LOG.info("Loaded config file from " + configFile.getAbsolutePath());
   return configStream;
 } catch (IOException e) {
-  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. " + e.getMessage());
+  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. Default configuration will be used." + e.getMessage());
 
 Review comment:
   Please change to and use [paramaterized 
logging](https://www.slf4j.org/faq.html#logging_performance) for all future 
logging. The implementation should be 
   ```
   LOG.info("File specified by environment variable {} = '{}' could not be 
loaded. Default configuration will be used.", MudrodConstants.MUDROD_CONFIG, 
configLocation, e.getMessage());
   ```
   Please use paramaterized logging whenever possible, it is a much more 
efficient implementation than string concatenation


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393106#comment-16393106
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495724
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataTokenizer.java
 ##
 @@ -18,7 +18,7 @@
 import java.io.Serializable;
 import java.util.*;
 
 Review comment:
   Never use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393110#comment-16393110
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495835
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/PODAACMetadataFeature.java
 ##
 @@ -0,0 +1,360 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
 
 Review comment:
   License header


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393112#comment-16393112
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495472
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataFeature.java
 ##
 @@ -0,0 +1,71 @@
+package org.apache.sdap.mudrod.recommendation.structure;
 
 Review comment:
   Add license header


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393111#comment-16393111
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173497664
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/structure/ApacheAccessLog.java
 ##
 @@ -15,55 +15,49 @@
 
 import com.google.gson.Gson;
 
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;
+import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
-import org.apache.sdap.mudrod.weblog.pre.CrawlerDetection;
-
 /**
  * This class represents an Apache access log line. See
  * http://httpd.apache.org/docs/2.2/logs.html for more details.
  */
 public class ApacheAccessLog extends WebLog implements Serializable {
-
-
-  /**
-   * 
-   */
-  private static final long serialVersionUID = 1L;
-
-  public ApacheAccessLog() {
-//default constructor
-  }
-
-  String response;
-  String referer;
-  String browser;
+  String Response;
 
 Review comment:
   Additionally, Java variable declatations should be ```firstLowerCamelCase``` 
NOT ```FirstCapitalCamelCase```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393120#comment-16393120
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496493
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/HistoryGenerator.java
 ##
 @@ -79,13 +77,21 @@ public void generateBinaryMatrix() {
   String[] logIndices = logIndexList.toArray(new String[0]);
   String[] statictypeArray = new String[] { this.sessionStats };
   int docCount = es.getDocCount(logIndices, statictypeArray);
+  
+  LOG.info(this.sessionStats + ":" + docCount);  
+  if (docCount==0) 
+  { 
+bw.close(); 
+file.delete();
+return;
+  }
 
   SearchResponse sr = 
es.getClient().prepareSearch(logIndices).setTypes(statictypeArray).setQuery(QueryBuilders.matchAllQuery()).setSize(0)
   
.addAggregation(AggregationBuilders.terms("IPs").field("IP").size(docCount)).execute().actionGet();
   Terms ips = sr.getAggregations().get("IPs");
   List ipList = new ArrayList<>();
   for (Terms.Bucket entry : ips.getBuckets()) {
-if (entry.getDocCount() > 
Integer.parseInt(props.getProperty(MudrodConstants.MINI_USER_HISTORY))) { // 
filter
+if (entry.getDocCount() >= 
Integer.parseInt(props.getProperty(MudrodConstants.QUERY_MIN))) { // filter
 
 Review comment:
   Why do we now make this inclusive?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393095#comment-16393095
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495176
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/process/FeatureBasedSimilarity.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.process;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.index.IndexRequest;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.DecimalFormat;
+import java.util.*;
+
+import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
+
+public class FeatureBasedSimilarity extends DiscoveryStepAbstract implements 
Serializable {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(FeatureBasedSimilarity.class);
+
+  private DecimalFormat df = new DecimalFormat("#.000");
+  // a map from variable to its type
+  MetadataFeature metadata = null;
+  public Map variableTypes;
+  public Map variableWeights;
+
+
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+  private String variableSimType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public FeatureBasedSimilarity(Properties props, ESDriver es, SparkDriver 
spark) {
+super(props, es, spark);
+
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+variableSimType = MudrodConstants.METADATA_FEATURE_SIM_TYPE;
+ 
+// !!! important, please change to other class when using other metadata
+metadata = new PODAACMetadataFeature();
+metadata.inital();
+variableTypes = metadata.featureTypes;
+variableWeights = metadata.featureWeights;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*calculating metadata feature based similarity 
starts**");
+startTime = System.currentTimeMillis();
+es.deleteType(indexName, variableSimType);
+addMapping(es, indexName, variableSimType);
+
+featureSimilarity(es);
+es.refreshIndex();
+normalizeVariableWeight(es);
+es.refreshIndex();
+endTime = System.currentTimeMillis();
+LOG.info("*calculating metadata feature based similarity 
ends**Took {}s", (endTime - startTime) / 1000);
+return null;
+  }
+
+  @Override
+  public Object execute(Object o) {
+return null;
+  }
+
+  public void featureSimilarity(ESDriver es) {
+
+es.createBulkProcessor();
+
+List> metadatas = new ArrayList<>();
+SearchResponse scrollResp = 
es.getClient().prepareSearch(indexName).setTypes(metadataType).setScroll(new 
TimeValue(6)).setQuery(QueryBuilders.matchAllQuery()).setSize(100).execute()
+.actionGet();
+while (true) {
+  for (SearchHit hit : scrollResp.getHits().getHits()) {
+Map metadataA = hit.getSource();
+metadatas.add(metadataA);
+  }
+
+  scrollResp = 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393115#comment-16393115
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496855
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/SessionStatistic.java
 ##
 @@ -221,17 +221,17 @@ public int processSession(ESDriver es, String sessionId) 
throws IOException, Int
   request = matcher.group(1);
 }
 
-String datasetlist = "/datasetlist?";
-String dataset = "/dataset/";
+String datasetlist = props.getProperty(MudrodConstants.SEARCH_MARKER);
+String dataset = props.getProperty(MudrodConstants.VIEW_MARKER);
 if (request.contains(datasetlist)) {
   searchDataListRequestCount++;
 
   RequestUrl requestURL = new RequestUrl();
   String infoStr = requestURL.getSearchInfo(request) + ",";
-  String info = es.customAnalyzing(props.getProperty("indexName"), 
infoStr);
+  String info = 
es.customAnalyzing(props.getProperty(MudrodConstants.ES_INDEX_NAME), infoStr);
 
-  if (!",".equals(info)) {
-if ("".equals(keywords)) {
+  if (!info.equals(",")) {
+if (keywords.equals("")) {
 
 Review comment:
   Move string value to right hand side of statement


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393121#comment-16393121
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173495358
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/process/FeatureBasedSimilarity.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.process;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.index.IndexRequest;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.DecimalFormat;
+import java.util.*;
+
+import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
+
+public class FeatureBasedSimilarity extends DiscoveryStepAbstract implements 
Serializable {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(FeatureBasedSimilarity.class);
+
+  private DecimalFormat df = new DecimalFormat("#.000");
+  // a map from variable to its type
+  MetadataFeature metadata = null;
+  public Map variableTypes;
+  public Map variableWeights;
+
+
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+  private String variableSimType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public FeatureBasedSimilarity(Properties props, ESDriver es, SparkDriver 
spark) {
+super(props, es, spark);
+
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+variableSimType = MudrodConstants.METADATA_FEATURE_SIM_TYPE;
+ 
+// !!! important, please change to other class when using other metadata
+metadata = new PODAACMetadataFeature();
+metadata.inital();
+variableTypes = metadata.featureTypes;
+variableWeights = metadata.featureWeights;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*calculating metadata feature based similarity 
starts**");
+startTime = System.currentTimeMillis();
+es.deleteType(indexName, variableSimType);
+addMapping(es, indexName, variableSimType);
+
+featureSimilarity(es);
+es.refreshIndex();
+normalizeVariableWeight(es);
+es.refreshIndex();
+endTime = System.currentTimeMillis();
+LOG.info("*calculating metadata feature based similarity 
ends**Took {}s", (endTime - startTime) / 1000);
+return null;
+  }
+
+  @Override
+  public Object execute(Object o) {
+return null;
+  }
+
+  public void featureSimilarity(ESDriver es) {
+
+es.createBulkProcessor();
+
+List> metadatas = new ArrayList<>();
+SearchResponse scrollResp = 
es.getClient().prepareSearch(indexName).setTypes(metadataType).setScroll(new 
TimeValue(6)).setQuery(QueryBuilders.matchAllQuery()).setSize(100).execute()
+.actionGet();
+while (true) {
+  for (SearchHit hit : scrollResp.getHits().getHits()) {
+Map metadataA = hit.getSource();
+metadatas.add(metadataA);
+  }
+
+  scrollResp = 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393097#comment-16393097
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173494015
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/pre/NormalizeFeatures.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.pre;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+import java.util.regex.Pattern;
+
+public class NormalizeFeatures extends DiscoveryStepAbstract {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+  private static final Logger LOG = 
LoggerFactory.getLogger(NormalizeFeatures.class);
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public NormalizeFeatures(Properties props, ESDriver es, SparkDriver spark) {
+super(props, es, spark);
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*proprocessing metadata feature 
starts**");
 
 Review comment:
   This logging is very messy. If I were you I would remove it from as many 
instance of the codebase as you can. It makes logs very convoluted and also 
makes some types of log analysis difficult should we wish to do that in the 
future. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393116#comment-16393116
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #7: SDAP-35 Overhaul MUDROD 
configuration
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/7#discussion_r173496346
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/weblog/pre/HistoryGenerator.java
 ##
 @@ -79,13 +77,21 @@ public void generateBinaryMatrix() {
   String[] logIndices = logIndexList.toArray(new String[0]);
   String[] statictypeArray = new String[] { this.sessionStats };
   int docCount = es.getDocCount(logIndices, statictypeArray);
+  
+  LOG.info(this.sessionStats + ":" + docCount);  
 
 Review comment:
   Use parameterized logging


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390011#comment-16390011
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945574
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataFeature.java
 ##
 @@ -0,0 +1,71 @@
+package org.apache.sdap.mudrod.recommendation.structure;
 
 Review comment:
   Add license header


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390013#comment-16390013
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946598
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ssearch/ranking/DataGenerator.java
 ##
 @@ -136,7 +135,7 @@ public static void parseFile() {
* @param arr the parsed contents of the original CSV file
*/
   public static void calculateVec(String[][] arr) {
-List listofLists = new ArrayList<>(); // Holds calculations 
+List listofLists = new ArrayList(); // Holds 
calculations 
 
 Review comment:
   Use diamond


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390014#comment-16390014
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945847
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/PODAACMetadataFeature.java
 ##
 @@ -0,0 +1,360 @@
+package org.apache.sdap.mudrod.recommendation.structure;
 
 Review comment:
   License header


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390003#comment-16390003
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928135
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/pre/AggregateTriples.java
 ##
 @@ -167,17 +169,17 @@ public Element findChild(String str, Element ele) {
   public void getAllClass() throws IOException {
 List classElements = rootNode.getChildren("Class", 
Namespace.getNamespace("owl", owl_namespace));
 
-for (Object classElement1 : classElements) {
-  Element classElement = (Element) classElement1;
+for (int i = 0; i < classElements.size(); i++) {
+  Element classElement = (Element) classElements.get(i);
   String className = classElement.getAttributeValue("about", 
Namespace.getNamespace("rdf", rdf_namespace));
 
   if (className == null) {
 className = classElement.getAttributeValue("ID", 
Namespace.getNamespace("rdf", rdf_namespace));
   }
 
   List subclassElements = classElement.getChildren("subClassOf", 
Namespace.getNamespace("rdfs", rdfs_namespace));
-  for (Object subclassElement1 : subclassElements) {
-Element subclassElement = (Element) subclassElement1;
+  for (int j = 0; j < subclassElements.size(); j++) {
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389973#comment-16389973
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924264
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
-  public static final String CLICK_STREAM_MATRIX_TYPE = 
"clickStreamMatrixType";
+  public static final String CLICK_STREAM_LINKAGE_TYPE = "clickStreamLinkage";
 
-  public static final String CLICKSTREAM_SVD_DIM = "clickstreamSVDDimension";
+  public static final String CLICK_STREAM_MATRIX_TYPE = "clickStreamMatrix";
 
-  public static final String CLICKSTREAM_W = "clickStream_w";
+  public static final String CLICKSTREAM_SVD_DIM = "mudrod.clickstream.svd.d";
 
-  public static final String COMMENT_TYPE = "commentType";
+  public static final String CLICKSTREAM_W = "mudrod.clickstream.weight";
+  
+  public static final String CLICKSTREAM_PATH = "mudrod.clickstream.path";
+  
+  public static final String CLICKSTREAM_SVD_PATH = 
"mudrod.clickstream.svd.path";
 
   /** Defined on CLI */
   public static final String DATA_DIR = "dataDir";
 
-  public static final String DOWNLOAD_F = "downloadf";
+  public static final String DOWNLOAD_WEIGHT = "mudrod.download.weight";
 
-  public static final String DOWNLOAD_WEIGHT = "downloadWeight";
+  public static final String ES_CLUSTER = "mudrod.cluster.name";
 
-  public static final String ES_CLUSTER = "clusterName";
+  public static final String ES_TRANSPORT_TCP_PORT = 
"mudrod.es.transport.tcp.port";
 
-  public static final String ES_TRANSPORT_TCP_PORT = "ES_Transport_TCP_Port";
+  public static final String ES_UNICAST_HOSTS = "mudrod.es.unicast.hosts";
 
-  public static final String ES_UNICAST_HOSTS = "ES_unicast_hosts";
+  public static final String ES_HTTP_PORT = "mudrod.es.http.port";
 
-  public static final String ES_HTTP_PORT = "ES_HTTP_port";
+  public static final String ES_INDEX_NAME = "mudrod.es.index";
 
-  public static final String ES_INDEX_NAME = "indexName";
+  public static final String FTP_PREFIX = "mudrod.ftp.prefix";
 
-  public static final String FTP_PREFIX = "ftpPrefix";
+  public static final String FTP_TYPE = "rawftp";
+  
+  public static final String FTP_LOG = "ftp";
 
-  public static final String FTP_TYPE_PREFIX = "FTP_type_prefix";
+  public static final String HTTP_PREFIX = "mudrod.http.prefix";
 
-  public static final String HTTP_PREFIX = "httpPrefix";
+  public static final String HTTP_TYPE = "rawhttp";
 
 Review comment:
   Should be ```raw.http```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389979#comment-16389979
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172923924
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
-  public static final String CLICK_STREAM_MATRIX_TYPE = 
"clickStreamMatrixType";
+  public static final String CLICK_STREAM_LINKAGE_TYPE = "clickStreamLinkage";
 
-  public static final String CLICKSTREAM_SVD_DIM = "clickstreamSVDDimension";
+  public static final String CLICK_STREAM_MATRIX_TYPE = "clickStreamMatrix";
 
-  public static final String CLICKSTREAM_W = "clickStream_w";
+  public static final String CLICKSTREAM_SVD_DIM = "mudrod.clickstream.svd.d";
 
-  public static final String COMMENT_TYPE = "commentType";
+  public static final String CLICKSTREAM_W = "mudrod.clickstream.weight";
+  
+  public static final String CLICKSTREAM_PATH = "mudrod.clickstream.path";
+  
+  public static final String CLICKSTREAM_SVD_PATH = 
"mudrod.clickstream.svd.path";
 
   /** Defined on CLI */
   public static final String DATA_DIR = "dataDir";
 
 Review comment:
   should be ```data.dir```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389989#comment-16389989
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928508
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/process/OwlParser.java
 ##
 @@ -17,13 +17,15 @@
 import org.apache.jena.ontology.OntClass;
 import org.apache.jena.ontology.OntModel;
 import org.apache.jena.rdf.model.Literal;
-import org.apache.sdap.mudrod.ontology.Ontology;
 
 import com.esotericsoftware.minlog.Log;
 
+import org.apache.sdap.mudrod.ontology.Ontology;
+
 import java.util.ArrayList;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Set;
 
 Review comment:
   is this set being used? If not don't add it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389992#comment-16389992
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946238
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/ssearch/Searcher.java
 ##
 @@ -260,19 +259,19 @@ public String ssearch(String index, String type, String 
query, String queryOpera
 Gson gson = new Gson();
 List fileList = new ArrayList<>();
 
-for (SResult aLi : li) {
+for (int i = 0; i < li.size(); i++) {
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389964#comment-16389964
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172922568
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/driver/SparkDriver.java
 ##
 @@ -19,11 +19,43 @@
 import org.apache.spark.serializer.KryoSerializer;
 import org.apache.spark.sql.SQLContext;
 
+import java.io.File;
 import java.io.Serializable;
+import java.net.URISyntaxException;
 import java.util.Properties;
+//import org.apache.spark.sql.SparkSession;
 
 public class SparkDriver implements Serializable {
 
+  //TODO the commented out code below is the API uprgade
 
 Review comment:
   Remove this please. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389975#comment-16389975
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924387
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -80,44 +90,80 @@
*/
   public static final String ONTOLOGY_IMPL = MUDROD + 
"ontology.implementation";
 
-  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkageType";
+  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkage";
 
 Review comment:
   Should be ```ontology.linkage```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389970#comment-16389970
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172927739
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/MetadataExtractor.java
 ##
 @@ -91,7 +91,10 @@ public MetadataExtractor() {
 } catch (InterruptedException | ExecutionException e) {
   e.printStackTrace();
 
-}
+}*/
+
+// change PODAACMetadata class for other kind of metadata !!! important
 
 Review comment:
   If this is an issue then log an issue in JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390006#comment-16390006
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172926174
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodEngine.java
 ##
 @@ -138,48 +135,38 @@ private InputStream locateConfig() {
   LOG.info("Loaded config file from " + configFile.getAbsolutePath());
   return configStream;
 } catch (IOException e) {
-  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. " + e.getMessage());
+  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. Default configuration will be used." + e.getMessage());
 }
 
-InputStream configStream = 
MudrodEngine.class.getClassLoader().getResourceAsStream("config.xml");
+InputStream configStream = 
MudrodEngine.class.getClassLoader().getResourceAsStream("config.properties");
 
 if (configStream != null) {
-  LOG.info("Loaded config file from {}", 
MudrodEngine.class.getClassLoader().getResource("config.xml").getPath());
+  LOG.info("Loaded config file from {}", 
MudrodEngine.class.getClassLoader().getResource("config.properties").getPath());
 }
 
 return configStream;
   }
 
   /**
* Load the configuration provided at https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml.
+   * 
"https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.properties;>config.properties.
 
 Review comment:
   The source code now lives at Apache either remove the URL or link it 
correctly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389967#comment-16389967
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172922379
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/driver/ESDriver.java
 ##
 @@ -561,6 +569,9 @@ public int getDocCount(String[] index, String[] type) {
 return this.getDocCount(index, type, search);
   }
 
+  /*
+   * Get the number of docs in a type of a index
 
 Review comment:
   If you are going to provide Javadoc (which I highly suggest), please also 
provide the parameters as well as the return statements.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1639#comment-1639
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924793
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodEngine.java
 ##
 @@ -52,11 +47,13 @@
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipInputStream;
 
+import static org.apache.sdap.mudrod.main.MudrodConstants.DATA_DIR;
+
 /**
  * Main entry point for Running the Mudrod system. Invocation of this class is
  * tightly linked to the primary Mudrod configuration which can be located at
  * https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml.
+ * 
"https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.properties;>config.properties.
 
 Review comment:
   This URL is incorrect. The code now lives at Apache


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390010#comment-16390010
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946142
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/ssearch/Searcher.java
 ##
 @@ -16,17 +16,18 @@
 import com.google.gson.Gson;
 import com.google.gson.JsonElement;
 import com.google.gson.JsonObject;
-
 import org.apache.sdap.mudrod.discoveryengine.MudrodAbstract;
 import org.apache.sdap.mudrod.driver.ESDriver;
 import org.apache.sdap.mudrod.driver.SparkDriver;
 import org.apache.sdap.mudrod.ssearch.structure.SResult;
+
 import org.elasticsearch.action.search.SearchRequestBuilder;
 import org.elasticsearch.action.search.SearchResponse;
 import org.elasticsearch.index.query.BoolQueryBuilder;
 import org.elasticsearch.index.query.QueryBuilder;
 import org.elasticsearch.index.query.QueryBuilders;
 import org.elasticsearch.search.SearchHit;
+import org.elasticsearch.search.sort.SortBuilder;
 
 Review comment:
   If this is not being used, don't add it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389995#comment-16389995
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945711
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataTokenizer.java
 ##
 @@ -18,7 +18,7 @@
 import java.io.Serializable;
 import java.util.*;
 
 Review comment:
   Don;t use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389980#comment-16389980
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172923662
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
 Review comment:
   Here the value is camelCased. it should be 
   ```
   cleanup.log
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389991#comment-16389991
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928233
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/process/EsipCOROntology.java
 ##
 @@ -13,10 +13,10 @@
  */
 package org.apache.sdap.mudrod.ontology.process;
 
-import java.util.Iterator;
-
 import org.apache.sdap.mudrod.ontology.Ontology;
 
+import java.util.Iterator;
 
 Review comment:
   Is this being used? If not don't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389961#comment-16389961
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172921684
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/discoveryengine/WeblogDiscoveryEngine.java
 ##
 @@ -99,11 +99,11 @@ public void preprocess() {
 
 ArrayList inputList = (ArrayList) 
getFileList(props.getProperty(MudrodConstants.DATA_DIR));
 
-for (String anInputList : inputList) {
-  timeSuffix = anInputList;
+for (int i = 0; i < inputList.size(); i++) {
 
 Review comment:
   This propose change is actually worse for execution. List comprehension in 
Java is more performant the way it is. Your proposal slows it down, please 
revert your change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389976#comment-16389976
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172926779
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/pre/ApiHarvester.java
 ##
 @@ -148,7 +150,7 @@ private void harvestMetadatafromWeb() {
 int docId = startIndex + i;
 File itemfile = new 
File(props.getProperty(MudrodConstants.RAW_METADATA_PATH) + "/" + docId + 
".json");
 
-try (FileWriter fw = new FileWriter(itemfile.getAbsoluteFile()); 
BufferedWriter bw = new BufferedWriter(fw)) {
+try (FileWriter fw = new FileWriter(itemfile.getAbsoluteFile()); 
BufferedWriter bw = new BufferedWriter(fw);) {
 
 Review comment:
   I don't think there is any reason to add the semi-colon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389987#comment-16389987
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946684
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ssearch/ranking/DataGenerator.java
 ##
 @@ -145,6 +144,7 @@ public static void calculateVec(String[][] arr) {
 List colList = new ArrayList(); // create vector to 
store all values inside of a column, which is stored inside 2D vector
 for (int col = 0; col < arr[0].length - 1; col++) // Columns go until 
the next to last column
 {
+  //System.out.println(col + " " + arr[row][col]);
 
 Review comment:
   Remove line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389974#comment-16389974
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924336
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
-  public static final String CLICK_STREAM_MATRIX_TYPE = 
"clickStreamMatrixType";
+  public static final String CLICK_STREAM_LINKAGE_TYPE = "clickStreamLinkage";
 
-  public static final String CLICKSTREAM_SVD_DIM = "clickstreamSVDDimension";
+  public static final String CLICK_STREAM_MATRIX_TYPE = "clickStreamMatrix";
 
-  public static final String CLICKSTREAM_W = "clickStream_w";
+  public static final String CLICKSTREAM_SVD_DIM = "mudrod.clickstream.svd.d";
 
-  public static final String COMMENT_TYPE = "commentType";
+  public static final String CLICKSTREAM_W = "mudrod.clickstream.weight";
+  
+  public static final String CLICKSTREAM_PATH = "mudrod.clickstream.path";
+  
+  public static final String CLICKSTREAM_SVD_PATH = 
"mudrod.clickstream.svd.path";
 
   /** Defined on CLI */
   public static final String DATA_DIR = "dataDir";
 
-  public static final String DOWNLOAD_F = "downloadf";
+  public static final String DOWNLOAD_WEIGHT = "mudrod.download.weight";
 
-  public static final String DOWNLOAD_WEIGHT = "downloadWeight";
+  public static final String ES_CLUSTER = "mudrod.cluster.name";
 
-  public static final String ES_CLUSTER = "clusterName";
+  public static final String ES_TRANSPORT_TCP_PORT = 
"mudrod.es.transport.tcp.port";
 
-  public static final String ES_TRANSPORT_TCP_PORT = "ES_Transport_TCP_Port";
+  public static final String ES_UNICAST_HOSTS = "mudrod.es.unicast.hosts";
 
-  public static final String ES_UNICAST_HOSTS = "ES_unicast_hosts";
+  public static final String ES_HTTP_PORT = "mudrod.es.http.port";
 
-  public static final String ES_HTTP_PORT = "ES_HTTP_port";
+  public static final String ES_INDEX_NAME = "mudrod.es.index";
 
-  public static final String ES_INDEX_NAME = "indexName";
+  public static final String FTP_PREFIX = "mudrod.ftp.prefix";
 
-  public static final String FTP_PREFIX = "ftpPrefix";
+  public static final String FTP_TYPE = "rawftp";
+  
+  public static final String FTP_LOG = "ftp";
 
-  public static final String FTP_TYPE_PREFIX = "FTP_type_prefix";
+  public static final String HTTP_PREFIX = "mudrod.http.prefix";
 
-  public static final String HTTP_PREFIX = "httpPrefix";
+  public static final String HTTP_TYPE = "rawhttp";
+  
+  public static final String HTTP_LOG = "http";
+  
+  public static final String BASE_URL = "mudrod.base.url";
+  
+  public static final String BLACK_LIST_REQUEST = "mudrod.black.request.list";
+  
+  public static final String BLACK_LIST_AGENT = "mudrod.black.agent.list";
 
-  public static final String HTTP_TYPE_PREFIX = "HTTP_type_prefix";
+  public static final String LOG_INDEX = "mudrod.log.index";
 
-  public static final String LOG_INDEX = "logIndexName";
+  public static final String METADATA_LINKAGE_TYPE = "MetadataLinkage";
 
 Review comment:
   Should be ```metadata.linkage```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390008#comment-16390008
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945802
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataTokenizer.java
 ##
 @@ -109,10 +106,12 @@ public MetadataOpt(Properties props) {
 
   for (SearchHit hit : scrollResp.getHits().getHits()) {
 Map result = hit.getSource();
-String shortName = (String) result.get("Dataset-ShortName");
+String shortName = (String) result.get(metadataName);
 
 String filedStr = "";
-for (String filed : variables) {
+int size = variables.size();
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389962#comment-16389962
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172922237
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/driver/ESDriver.java
 ##
 @@ -223,7 +229,9 @@ public void deleteType(String index, String type) {
 String[] indices = client.admin().indices().getIndex(new 
GetIndexRequest()).actionGet().getIndices();
 
 ArrayList indexList = new ArrayList<>();
-for (String indexName : indices) {
+int length = indices.length;
 
 Review comment:
   Please revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389994#comment-16389994
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945971
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/PODAACMetadataFeature.java
 ##
 @@ -0,0 +1,360 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.utils.LabeledRowMatrix;
+import org.apache.sdap.mudrod.utils.MatrixUtil;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import scala.Tuple2;
+import scala.tools.nsc.transform.SpecializeTypes.Abstract;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.*;
+import java.util.regex.Pattern;
+
+public class PODAACMetadataFeature extends MetadataFeature {
+
 
 Review comment:
   Formatting should be 2 space tab indents


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389986#comment-16389986
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928327
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/process/LocalOntology.java
 ##
 @@ -116,10 +117,12 @@ public void load() {
*/
   @Override
   public void load(String[] urls) {
-for (String url1 : urls) {
-  String url = url1.trim();
-  if (!"".equals(url) && LOG.isInfoEnabled())
-LOG.info("Reading and processing {}", url);
+for (int i = 0; i < urls.length; i++) {
 
 Review comment:
   revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389985#comment-16389985
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172925960
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodEngine.java
 ##
 @@ -138,48 +135,38 @@ private InputStream locateConfig() {
   LOG.info("Loaded config file from " + configFile.getAbsolutePath());
   return configStream;
 } catch (IOException e) {
-  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. " + e.getMessage());
+  LOG.info("File specified by environment variable " + 
MudrodConstants.MUDROD_CONFIG + "=\'" + configLocation + "\' could not be 
loaded. Default configuration will be used." + e.getMessage());
 
 Review comment:
   For any instance of logging which involves parameter substitution, you 
should use [parameterized 
logging](https://www.slf4j.org/faq.html#logging_performance) e.g.
   ```
   LOG.error("File specified by environment variable {} = '{}' could not be 
loaded. Default configuration will be used.", MudrodConstants.MUDROD_CONFIG, 
configLocation, e.getMessage());
   ```
   Note it should also be a LOG.error not LOG.info


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390012#comment-16390012
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945091
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/pre/NormalizeFeatures.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you 
+ * may not use this file except in compliance with the License. 
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * This package includes the preprocessing, processing, and data structure used
+ * by recommendation module.
+ */
+package org.apache.sdap.mudrod.recommendation.pre;
+
+import org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract;
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.recommendation.structure.MetadataFeature;
+import org.apache.sdap.mudrod.recommendation.structure.PODAACMetadataFeature;
+
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.action.update.UpdateRequest;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+import java.util.regex.Pattern;
+
+public class NormalizeFeatures extends DiscoveryStepAbstract {
+
+  /**
+   *
+   */
+  private static final long serialVersionUID = 1L;
+  private static final Logger LOG = 
LoggerFactory.getLogger(NormalizeFeatures.class);
+  // index name
+  private String indexName;
+  // type name of metadata in ES
+  private String metadataType;
+
+  /**
+   * Creates a new instance of OHEncoder.
+   *
+   * @param props the Mudrod configuration
+   * @param esan instantiated {@link ESDriver}
+   * @param spark an instantiated {@link SparkDriver}
+   */
+  public NormalizeFeatures(Properties props, ESDriver es, SparkDriver spark) {
+super(props, es, spark);
+indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
+metadataType = MudrodConstants.RECOM_METADATA_TYPE;
+  }
+
+  @Override
+  public Object execute() {
+LOG.info("*proprocessing metadata feature 
starts**");
 
 Review comment:
   Please never use logging like this. It looks ridiculous and as logs grow it 
becomes a PITA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389977#comment-16389977
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924479
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -80,44 +90,80 @@
*/
   public static final String ONTOLOGY_IMPL = MUDROD + 
"ontology.implementation";
 
-  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkageType";
+  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkage";
 
-  public static final String ONTOLOGY_W = "ontology_w";
+  public static final String ONTOLOGY_W = "mudrod.ontology.weight";
+  
+  public static final String ONTOLOGY_PATH = "mudrod.ontology.path";
+  
+  public static final String ONTOLOGY_INPUT_PATH = 
"mudrod.ontology.input.path";
 
-  public static final String PROCESS_TYPE = "processingType";
+  public static final String PROCESS_TYPE = "mudrod.processing.type";
 
   /** Defined on CLI */
-  public static final String RAW_METADATA_PATH = "raw_metadataPath";
-
-  public static final String RAW_METADATA_TYPE = "raw_metadataType";
-
-  public static final String SEARCH_F = "searchf";
-
-  public static final String SENDING_RATE = "sendingrate";
-
-  public static final String SESSION_PORT = "SessionPort";
-
-  public static final String SESSION_STATS_PREFIX = "SessionStats_prefix";
-
-  public static final String SESSION_URL = "SessionUrl";
-
-  public static final String SPARK_APP_NAME = "spark.app.name";
-
-  public static final String SPARK_MASTER = "spark.master";
+  public static final String METADATA_DOWNLOAD = "mudrod.metadata.download";
+  
+  public static final String RAW_METADATA_PATH = "mudrod.metadata.path";
+
+  public static final String RAW_METADATA_TYPE = "mudrod.metadata.type";
+  
+  public static final String METADATA_MATRIX_PATH = 
"mudrod.metadata.matrix.path";
+  
+  public static final String METADATA_SVD_PATH = "mudrod.metadata.svd.path";
+  
+  public static final String RECOM_METADATA_TYPE = "recommedation.metadata";
+  
+  public static final String METADATA_ID = "mudrod.metadata.id";
+  
+  public static final String SEMANTIC_FIELDS = 
"mudrod.metadata.semantic.fields";
+  
+  public static final String METADATA_WORD_SIM_TYPE = "metadata.word.sim";
+  
+  public static final String METADATA_FEATURE_SIM_TYPE = 
"metadata.feature.sim";
+  
+  public static final String METADATA_SESSION_SIM_TYPE = 
"metadata.session.sim";
+  
+  public static final String METADATA_TERM_MATRIX_PATH = 
"metadata.term.matrix.path";
+  
+  public static final String METADATA_WORD_MATRIX_PATH = 
"metadata.word.matrix.path";
+  
+  public static final String METADATA_SESSION_MATRIX_PATH = 
"metadata.session.matrix.path";
+
+  public static final String REQUEST_RATE = "mudrod.request.rate";
+
+  public static final String SESSION_PORT = "mudrod.session.port";
+
+  public static final String SESSION_STATS_TYPE = "sessionstats";
 
 Review comment:
   Should be ```session.stats```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389988#comment-16389988
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945622
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/MetadataFeature.java
 ##
 @@ -0,0 +1,71 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.utils.LabeledRowMatrix;
+import org.apache.sdap.mudrod.utils.MatrixUtil;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import scala.Tuple2;
+import scala.tools.nsc.transform.SpecializeTypes.Abstract;
+
+import java.io.Serializable;
+import java.util.*;
 
 Review comment:
   Don't use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390004#comment-16390004
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928676
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/pre/ImportMetadata.java
 ##
 @@ -65,8 +64,8 @@ public void addMetadataMapping() {
 String mappingJson = "{\r\n   \"dynamic_templates\": " + "[\r\n  " + 
"{\r\n \"strings\": " + "{\r\n\"match_mapping_type\": 
\"string\","
 + "\r\n\"mapping\": {\r\n   \"type\": 
\"string\"," + "\r\n   \"analyzer\": \"csv\"\r\n}" + 
"\r\n }\r\n  }\r\n   ]\r\n}";
 
-
es.getClient().admin().indices().preparePutMapping(props.getProperty(MudrodConstants.ES_INDEX_NAME)).setType(props.getProperty("recom_metadataType")).setSource(mappingJson).execute().actionGet();
-
+
es.getClient().admin().indices().preparePutMapping(props.getProperty(MudrodConstants.ES_INDEX_NAME))
+
.setType(MudrodConstants.RECOM_METADATA_TYPE).setSource(mappingJson).execute().actionGet();
 
 Review comment:
   Why is this now separated over a line break?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389996#comment-16389996
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928154
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/pre/AggregateTriples.java
 ##
 @@ -192,8 +194,8 @@ public void getAllClass() throws IOException {
   }
 
   List equalClassElements = classElement.getChildren("equivalentClass", 
Namespace.getNamespace("owl", owl_namespace));
-  for (Object equalClassElement1 : equalClassElements) {
-Element equalClassElement = (Element) equalClassElement1;
+  for (int k = 0; k < equalClassElements.size(); k++) {
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389984#comment-16389984
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924182
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
-  public static final String CLICK_STREAM_MATRIX_TYPE = 
"clickStreamMatrixType";
+  public static final String CLICK_STREAM_LINKAGE_TYPE = "clickStreamLinkage";
 
-  public static final String CLICKSTREAM_SVD_DIM = "clickstreamSVDDimension";
+  public static final String CLICK_STREAM_MATRIX_TYPE = "clickStreamMatrix";
 
-  public static final String CLICKSTREAM_W = "clickStream_w";
+  public static final String CLICKSTREAM_SVD_DIM = "mudrod.clickstream.svd.d";
 
-  public static final String COMMENT_TYPE = "commentType";
+  public static final String CLICKSTREAM_W = "mudrod.clickstream.weight";
+  
+  public static final String CLICKSTREAM_PATH = "mudrod.clickstream.path";
+  
+  public static final String CLICKSTREAM_SVD_PATH = 
"mudrod.clickstream.svd.path";
 
   /** Defined on CLI */
   public static final String DATA_DIR = "dataDir";
 
-  public static final String DOWNLOAD_F = "downloadf";
+  public static final String DOWNLOAD_WEIGHT = "mudrod.download.weight";
 
-  public static final String DOWNLOAD_WEIGHT = "downloadWeight";
+  public static final String ES_CLUSTER = "mudrod.cluster.name";
 
-  public static final String ES_CLUSTER = "clusterName";
+  public static final String ES_TRANSPORT_TCP_PORT = 
"mudrod.es.transport.tcp.port";
 
-  public static final String ES_TRANSPORT_TCP_PORT = "ES_Transport_TCP_Port";
+  public static final String ES_UNICAST_HOSTS = "mudrod.es.unicast.hosts";
 
-  public static final String ES_UNICAST_HOSTS = "ES_unicast_hosts";
+  public static final String ES_HTTP_PORT = "mudrod.es.http.port";
 
-  public static final String ES_HTTP_PORT = "ES_HTTP_port";
+  public static final String ES_INDEX_NAME = "mudrod.es.index";
 
-  public static final String ES_INDEX_NAME = "indexName";
+  public static final String FTP_PREFIX = "mudrod.ftp.prefix";
 
-  public static final String FTP_PREFIX = "ftpPrefix";
+  public static final String FTP_TYPE = "rawftp";
 
 Review comment:
   Should be ```raw.ftp```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389990#comment-16389990
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946203
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/ssearch/Searcher.java
 ##
 @@ -186,7 +186,7 @@ public Double exists(ArrayList strList, String 
query) {
   }
 
   ArrayList longdate = (ArrayList) 
result.get("DatasetCitation-ReleaseDateLong");
-  Date date = new Date(Long.valueOf(longdate.get(0)));
+  Date date = new Date(Long.valueOf(longdate.get(0)).longValue());
 
 Review comment:
   Why call long value?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389983#comment-16389983
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924620
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -80,44 +90,80 @@
*/
   public static final String ONTOLOGY_IMPL = MUDROD + 
"ontology.implementation";
 
-  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkageType";
+  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkage";
 
-  public static final String ONTOLOGY_W = "ontology_w";
+  public static final String ONTOLOGY_W = "mudrod.ontology.weight";
+  
+  public static final String ONTOLOGY_PATH = "mudrod.ontology.path";
+  
+  public static final String ONTOLOGY_INPUT_PATH = 
"mudrod.ontology.input.path";
 
-  public static final String PROCESS_TYPE = "processingType";
+  public static final String PROCESS_TYPE = "mudrod.processing.type";
 
   /** Defined on CLI */
-  public static final String RAW_METADATA_PATH = "raw_metadataPath";
-
-  public static final String RAW_METADATA_TYPE = "raw_metadataType";
-
-  public static final String SEARCH_F = "searchf";
-
-  public static final String SENDING_RATE = "sendingrate";
-
-  public static final String SESSION_PORT = "SessionPort";
-
-  public static final String SESSION_STATS_PREFIX = "SessionStats_prefix";
-
-  public static final String SESSION_URL = "SessionUrl";
-
-  public static final String SPARK_APP_NAME = "spark.app.name";
-
-  public static final String SPARK_MASTER = "spark.master";
+  public static final String METADATA_DOWNLOAD = "mudrod.metadata.download";
+  
+  public static final String RAW_METADATA_PATH = "mudrod.metadata.path";
+
+  public static final String RAW_METADATA_TYPE = "mudrod.metadata.type";
+  
+  public static final String METADATA_MATRIX_PATH = 
"mudrod.metadata.matrix.path";
+  
+  public static final String METADATA_SVD_PATH = "mudrod.metadata.svd.path";
+  
+  public static final String RECOM_METADATA_TYPE = "recommedation.metadata";
+  
+  public static final String METADATA_ID = "mudrod.metadata.id";
+  
+  public static final String SEMANTIC_FIELDS = 
"mudrod.metadata.semantic.fields";
+  
+  public static final String METADATA_WORD_SIM_TYPE = "metadata.word.sim";
+  
+  public static final String METADATA_FEATURE_SIM_TYPE = 
"metadata.feature.sim";
+  
+  public static final String METADATA_SESSION_SIM_TYPE = 
"metadata.session.sim";
+  
+  public static final String METADATA_TERM_MATRIX_PATH = 
"metadata.term.matrix.path";
+  
+  public static final String METADATA_WORD_MATRIX_PATH = 
"metadata.word.matrix.path";
+  
+  public static final String METADATA_SESSION_MATRIX_PATH = 
"metadata.session.matrix.path";
+
+  public static final String REQUEST_RATE = "mudrod.request.rate";
+
+  public static final String SESSION_PORT = "mudrod.session.port";
+
+  public static final String SESSION_STATS_TYPE = "sessionstats";
+
+  public static final String SESSION_URL = "mudrod.session.url";
+
+  public static final String SPARK_APP_NAME = "mudrod.spark.app.name";
+
+  public static final String SPARK_MASTER = "mudrod.spark.master";
   /**
* Absolute local location of javaSVMWithSGDModel directory. This is 
typically
* 
file:///usr/local/mudrod/core/src/main/resources/javaSVMWithSGDModel
*/
-  public static final String SVM_SGD_MODEL = "svmSgdModel";
+  public static final String RANKING_MODEL = "mudrod.ranking.model";
 
-  public static final String TIMEGAP = "timegap";
+  public static final String REQUEST_TIME_GAP = "mudrod.request.time.gap";
 
   public static final String TIME_SUFFIX = "TimeSuffix";
 
-  public static final String USE_HISTORY_LINKAGE_TYPE = 
"userHistoryLinkageType";
-
-  public static final String USER_HISTORY_W = "userHistory_w";
-
-  public static final String VIEW_F = "viewf";
+  public static final String USE_HISTORY_LINKAGE_TYPE = "userHistoryLinkage";
 
 Review comment:
   Should be ```user.history.linkage```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390016#comment-16390016
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946540
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ssearch/ranking/DataGenerator.java
 ##
 @@ -34,10 +33,10 @@
   private static boolean isMultFiles;
 
   private static String[] myHeader;
-  private static List myMasterList = new ArrayList<>();
+  private static List myMasterList = new 
ArrayList();
 
   // HashMap used for comparing evaluation classes
-  public static final Map map1 = new HashMap<>();
+  public static final HashMap map1 = new HashMap();
 
 Review comment:
   Return types should not be concrete implementations if possible.  Also use 
diamond operator


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389998#comment-16389998
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945903
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/PODAACMetadataFeature.java
 ##
 @@ -0,0 +1,360 @@
+package org.apache.sdap.mudrod.recommendation.structure;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+import org.apache.sdap.mudrod.driver.SparkDriver;
+import org.apache.sdap.mudrod.main.MudrodConstants;
+import org.apache.sdap.mudrod.utils.LabeledRowMatrix;
+import org.apache.sdap.mudrod.utils.MatrixUtil;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.unit.TimeValue;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.search.SearchHit;
+import scala.Tuple2;
+import scala.tools.nsc.transform.SpecializeTypes.Abstract;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.*;
 
 Review comment:
   Don't use wildcard imports


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389993#comment-16389993
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172927648
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/MetadataExtractor.java
 ##
 @@ -67,15 +67,15 @@ public MetadataExtractor() {
* @param type  metadata type name
* @return metadata list
*/
-  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
+  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
 
-List metadatas = new ArrayList();
+List metadatas = new ArrayList();
 SearchResponse scrollResp = 
es.getClient().prepareSearch(index).setTypes(type).setQuery(QueryBuilders.matchAllQuery()).setScroll(new
 TimeValue(6)).setSize(100).execute().actionGet();
 
 while (true) {
   for (SearchHit hit : scrollResp.getHits().getHits()) {
 Map result = hit.getSource();
-String shortname = (String) result.get("Dataset-ShortName");
+/*String shortname = (String) result.get("Dataset-ShortName");
 
 Review comment:
   Remove this code, never comment out code and leave it without a natural 
language comment to explain why.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389972#comment-16389972
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172922788
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/integration/LinkageIntegration.java
 ##
 @@ -54,7 +55,7 @@ public LinkageIntegration(Properties props, ESDriver es, 
SparkDriver spark) {
*/
   class LinkedTerm {
 String term = null;
-double weight = 0;
+double weight = 0.0;
 
 Review comment:
   I have a feeling that this is a useless assignment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389981#comment-16389981
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924574
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -80,44 +90,80 @@
*/
   public static final String ONTOLOGY_IMPL = MUDROD + 
"ontology.implementation";
 
-  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkageType";
+  public static final String ONTOLOGY_LINKAGE_TYPE = "ontologyLinkage";
 
-  public static final String ONTOLOGY_W = "ontology_w";
+  public static final String ONTOLOGY_W = "mudrod.ontology.weight";
+  
+  public static final String ONTOLOGY_PATH = "mudrod.ontology.path";
+  
+  public static final String ONTOLOGY_INPUT_PATH = 
"mudrod.ontology.input.path";
 
-  public static final String PROCESS_TYPE = "processingType";
+  public static final String PROCESS_TYPE = "mudrod.processing.type";
 
   /** Defined on CLI */
-  public static final String RAW_METADATA_PATH = "raw_metadataPath";
-
-  public static final String RAW_METADATA_TYPE = "raw_metadataType";
-
-  public static final String SEARCH_F = "searchf";
-
-  public static final String SENDING_RATE = "sendingrate";
-
-  public static final String SESSION_PORT = "SessionPort";
-
-  public static final String SESSION_STATS_PREFIX = "SessionStats_prefix";
-
-  public static final String SESSION_URL = "SessionUrl";
-
-  public static final String SPARK_APP_NAME = "spark.app.name";
-
-  public static final String SPARK_MASTER = "spark.master";
+  public static final String METADATA_DOWNLOAD = "mudrod.metadata.download";
+  
+  public static final String RAW_METADATA_PATH = "mudrod.metadata.path";
+
+  public static final String RAW_METADATA_TYPE = "mudrod.metadata.type";
+  
+  public static final String METADATA_MATRIX_PATH = 
"mudrod.metadata.matrix.path";
+  
+  public static final String METADATA_SVD_PATH = "mudrod.metadata.svd.path";
+  
+  public static final String RECOM_METADATA_TYPE = "recommedation.metadata";
+  
+  public static final String METADATA_ID = "mudrod.metadata.id";
+  
+  public static final String SEMANTIC_FIELDS = 
"mudrod.metadata.semantic.fields";
+  
+  public static final String METADATA_WORD_SIM_TYPE = "metadata.word.sim";
+  
+  public static final String METADATA_FEATURE_SIM_TYPE = 
"metadata.feature.sim";
+  
+  public static final String METADATA_SESSION_SIM_TYPE = 
"metadata.session.sim";
+  
+  public static final String METADATA_TERM_MATRIX_PATH = 
"metadata.term.matrix.path";
+  
+  public static final String METADATA_WORD_MATRIX_PATH = 
"metadata.word.matrix.path";
+  
+  public static final String METADATA_SESSION_MATRIX_PATH = 
"metadata.session.matrix.path";
+
+  public static final String REQUEST_RATE = "mudrod.request.rate";
+
+  public static final String SESSION_PORT = "mudrod.session.port";
+
+  public static final String SESSION_STATS_TYPE = "sessionstats";
+
+  public static final String SESSION_URL = "mudrod.session.url";
+
+  public static final String SPARK_APP_NAME = "mudrod.spark.app.name";
+
+  public static final String SPARK_MASTER = "mudrod.spark.master";
   /**
* Absolute local location of javaSVMWithSGDModel directory. This is 
typically
* 
file:///usr/local/mudrod/core/src/main/resources/javaSVMWithSGDModel
*/
-  public static final String SVM_SGD_MODEL = "svmSgdModel";
+  public static final String RANKING_MODEL = "mudrod.ranking.model";
 
-  public static final String TIMEGAP = "timegap";
+  public static final String REQUEST_TIME_GAP = "mudrod.request.time.gap";
 
   public static final String TIME_SUFFIX = "TimeSuffix";
 
 Review comment:
   Should be ```time.suffix```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you




[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389963#comment-16389963
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172922001
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/driver/ESDriver.java
 ##
 @@ -168,15 +167,22 @@ public String customAnalyzing(String indexName, String 
analyzer, String str) thr
 if (list == null) {
   return list;
 }
+int size = list.size();
 List customlist = new ArrayList<>();
-for (String aList : list) {
-  customlist.add(this.customAnalyzing(indexName, aList));
+for (int i = 0; i < size; i++) {
 
 Review comment:
   This is a regression please revert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389960#comment-16389960
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172921318
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/discoveryengine/WeblogDiscoveryEngine.java
 ##
 @@ -13,16 +13,16 @@
  */
 package org.apache.sdap.mudrod.discoveryengine;
 
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 import org.apache.sdap.mudrod.driver.ESDriver;
 import org.apache.sdap.mudrod.driver.SparkDriver;
 import org.apache.sdap.mudrod.main.MudrodConstants;
 import org.apache.sdap.mudrod.weblog.pre.*;
 
 Review comment:
   Please never use wildcard imports. It is messy and leads to confusion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390017#comment-16390017
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945299
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/pre/SessionCooccurence.java
 ##
 @@ -131,15 +138,16 @@ public Object execute(Object o) {
   private Map getOnServiceMetadata(ESDriver es) {
 
 String indexName = props.getProperty(MudrodConstants.ES_INDEX_NAME);
-String metadataType = props.getProperty("recom_metadataType");
+String metadataType = MudrodConstants.RECOM_METADATA_TYPE;
 
 Map shortnameMap = new HashMap<>();
 SearchResponse scrollResp = 
es.getClient().prepareSearch(indexName).setTypes(metadataType).setScroll(new 
TimeValue(6)).setQuery(QueryBuilders.matchAllQuery()).setSize(100).execute()
 .actionGet();
 while (true) {
   for (SearchHit hit : scrollResp.getHits().getHits()) {
 Map metadata = hit.getSource();
-String shortName = (String) metadata.get("Dataset-ShortName");
+//String shortName = (String) metadata.get("Dataset-ShortName");
 
 Review comment:
   Remove commented out code


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390002#comment-16390002
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172926397
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodEngine.java
 ##
 @@ -267,6 +254,7 @@ public void startFullIngest() {
   /**
* Only preprocess various {@link DiscoveryEngineAbstract} implementations 
for
* weblog, ontology and metadata, linkage discovery and integration.
+   * This command dose not perform log preprocessing
 
 Review comment:
   ```dose``` should be ```does``` please correct


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389966#comment-16389966
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172923320
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/integration/LinkageIntegration.java
 ##
 @@ -173,32 +171,32 @@ public JsonObject getIngeratedListInJson(String input) {
* the similarities from different sources
*/
   public Map 
aggregateRelatedTermsFromAllmodel(String input) {
-aggregateRelatedTerms(input, props.getProperty("userHistoryLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("clickStreamLinkageType"));
-aggregateRelatedTerms(input, props.getProperty("metadataLinkageType"));
-aggregateRelatedTermsSWEET(input, 
props.getProperty("ontologyLinkageType"));
+aggregateRelatedTerms(input, MudrodConstants.USE_HISTORY_LINKAGE_TYPE);
 
 Review comment:
   Any instance of this should be 
   ```
   props.getProperty(MudrodConstants.USE_HISTORY_LINKAGE_TYPE)
   ```
   Please update all of them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389978#comment-16389978
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172927978
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/PODAACMetadata.java
 ##
 @@ -16,322 +16,368 @@
 import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ExecutionException;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
 
 /**
  * ClassName: PODAACMetadata Function: PODAACMetadata setter and getter methods
  */
-public class PODAACMetadata implements Serializable {
-
-  /**
-   *
-   */
-  private static final long serialVersionUID = 1L;
-  // shortname: data set short name
-  private String shortname;
-  // abstractStr: data set abstract
-  private String abstractStr;
-  // isoTopic: data set topic
-  private String isoTopic;
-  // sensor: sensor
-  private String sensor;
-  // source: data source
-  private String source;
-  // project: data project
-  private String project;
-  // hasAbstarct: whether data set has abstract
-  boolean hasAbstarct;
-
-  // longnameList: data set long name list
-  private List longnameList;
-  // keywordList:data set key word list
-  private List keywordList;
-  // termList: data set term list
-  private List termList;
-  // topicList: data set topic list
-  private List topicList;
-  // variableList: data set variable list
-  private List variableList;
-  // abstractList: data set abstract term list
-  private List abstractList;
-  // isotopicList: data set iso topic list
-  private List isotopicList;
-  // sensorList: data set sensor list
-  private List sensorList;
-  // sourceList: data set source list
-  private List sourceList;
-  // projectList: data set project list
-  private List projectList;
-  // regionList: data set region list
-  private List regionList;
-
-  public PODAACMetadata() {
-// Default constructor
-  }
-
-  /**
-   * Creates a new instance of PODAACMetadata.
-   *
-   * @param shortname data set short name
-   * @param longname  data set long name
-   * @param topicsdata set topics
-   * @param terms data set terms
-   * @param variables data set variables
-   * @param keywords  data set keywords
-   * @param regionlist of regions
-   */
-  public PODAACMetadata(String shortname, List longname, List 
topics, List terms, List variables, List keywords, 
List region) {
-this.shortname = shortname;
-this.longnameList = longname;
-this.keywordList = keywords;
-this.termList = terms;
-this.topicList = topics;
-this.variableList = variables;
-this.regionList = region;
-  }
-
-  /**
-   * setTerms: set term of data set
-   *
-   * @param termstr data set terms
-   */
-  public void setTerms(String termstr) {
-this.splitString(termstr, this.termList);
-  }
-
-  /**
-   * setKeywords: set key word of data set
-   *
-   * @param keywords data set keywords
-   */
-  public void setKeywords(String keywords) {
-this.splitString(keywords, this.keywordList);
-  }
-
-  /**
-   * setTopicList: set topic of data set
-   *
-   * @param topicStr data set topics
-   */
-  public void setTopicList(String topicStr) {
-this.splitString(topicStr, this.topicList);
-  }
-
-  /**
-   * setVaraliableList: set varilable of data set
-   *
-   * @param varilableStr data set variables
-   */
-  public void setVaraliableList(String varilableStr) {
-this.splitString(varilableStr, this.variableList);
-  }
-
-  /**
-   * setProjectList:set project of data set
-   *
-   * @param project data set projects
-   */
-  public void setProjectList(String project) {
-this.splitString(project, this.projectList);
-  }
-
-  /**
-   * setSourceList: set source of data set
-   *
-   * @param source data set sources
-   */
-  public void setSourceList(String source) {
-this.splitString(source, this.sourceList);
-  }
-
-  /**
-   * setSensorList: set sensor of data set
-   *
-   * @param sensor data set sensors
-   */
-  public void setSensorList(String sensor) {
-this.splitString(sensor, this.sensorList);
-  }
-
-  /**
-   * setISOTopicList:set iso topic of data set
-   *
-   * @param isoTopic data set iso topics
-   */
-  public void setISOTopicList(String isoTopic) {
-this.splitString(isoTopic, this.isotopicList);
-  }
-
-  /**
-   * getKeywordList: get key word of data set
-   *
-   * @return data set keyword list
-   */
-  public List getKeywordList() {
-return this.keywordList;
-  }
-
-  /**
-   * getTermList:get term list of data set
-   *
-   * @return data set term list
-   */
-  public List getTermList() {
-return this.termList;
-  }
-
-  /**
-   * getShortName:get short name of 

[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390015#comment-16390015
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172927031
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/Metadata.java
 ##
 @@ -0,0 +1,67 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License"); you
+ * may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.sdap.mudrod.metadata.structure;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ExecutionException;
+
+import org.apache.sdap.mudrod.driver.ESDriver;
+
+/**
+ * ClassName: PODAACMetadata Function: PODAACMetadata setter and getter methods
+ */
+public abstract class Metadata implements Serializable  {
+
+  private static final long serialVersionUID = 1L;
+  // shortname: data set short name
+  protected String shortname;
+  
+  public Metadata() {
+// Default constructor
+  }
+
+  /**
+   * Creates a new instance of PODAACMetadata.
+   *
+   * @param shortname data set short name
+   * @param longname  data set long name
+   * @param topicsdata set topics
+   * @param terms data set terms
+   * @param variables data set variables
+   * @param keywords  data set keywords
+   * @param regionlist of regions
+   */
+  public Metadata(String shortname) {
 
 Review comment:
   The Javadoc here is completely incorrect.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1638#comment-1638
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928103
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/pre/AggregateTriples.java
 ##
 @@ -167,17 +169,17 @@ public Element findChild(String str, Element ele) {
   public void getAllClass() throws IOException {
 List classElements = rootNode.getChildren("Class", 
Namespace.getNamespace("owl", owl_namespace));
 
-for (Object classElement1 : classElements) {
-  Element classElement = (Element) classElement1;
+for (int i = 0; i < classElements.size(); i++) {
 
 Review comment:
   Revert this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389969#comment-16389969
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172926606
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/pre/ApiHarvester.java
 ##
 @@ -125,7 +125,9 @@ private void harvestMetadatafromWeb() {
 int doc_length = 0;
 JsonParser parser = new JsonParser();
 do {
-  String searchAPI = "https://podaac.jpl.nasa.gov/api/dataset?startIndex=; 
+ Integer.toString(startIndex) + 
"=10=Dataset-AllTimePopularity=asc===";
+  //String searchAPI = 
"https://podaac.jpl.nasa.gov/api/dataset?startIndex=; + 
Integer.toString(startIndex) + 
"=10=Dataset-AllTimePopularity=asc===";
 
 Review comment:
   Remove this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390001#comment-16390001
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172928252
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ontology/process/EsipPortalOntology.java
 ##
 @@ -13,10 +13,10 @@
  */
 package org.apache.sdap.mudrod.ontology.process;
 
-import java.util.Iterator;
-
 import org.apache.sdap.mudrod.ontology.Ontology;
 
+import java.util.Iterator;
 
 Review comment:
   Is this being used? If not don't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390005#comment-16390005
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945256
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/pre/SessionCooccurence.java
 ##
 @@ -109,7 +114,9 @@ public Object execute(Object o) {
   public Tuple2 call(Tuple2 
arg0) throws Exception {
 List oriDatasets = arg0._2;
 List newDatasets = new ArrayList<>();
-for (String name : oriDatasets) {
+int size = oriDatasets.size();
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389971#comment-16389971
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172927334
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/metadata/structure/MetadataExtractor.java
 ##
 @@ -67,15 +67,15 @@ public MetadataExtractor() {
* @param type  metadata type name
* @return metadata list
*/
-  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
+  protected List loadMetadataFromES(ESDriver es, String index, 
String type) {
 
-List metadatas = new ArrayList();
+List metadatas = new ArrayList();
 
 Review comment:
   Just use the diamond type operator no need to dual define the type. Should be
   ```
   List metadatas = new ArrayList<>();
   ```
   I would also suggest that you change the variable name to something more 
description such as ```metadataList```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389997#comment-16389997
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172945535
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/recommendation/structure/HybridRecommendation.java
 ##
 @@ -183,8 +183,9 @@ protected JsonElement mapToJson(Map 
wordweights, int num) {
 Map sortedMap = new HashMap<>();
 try {
   List links = getRelatedDataFromES(type, input, num);
-  for (LinkedTerm link : links) {
-termsMap.put(link.term, link.weight);
+  int size = links.size();
 
 Review comment:
   Revert this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389982#comment-16389982
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172924046
 
 

 ##
 File path: core/src/main/java/org/apache/sdap/mudrod/main/MudrodConstants.java
 ##
 @@ -13,63 +13,73 @@
  */
 package org.apache.sdap.mudrod.main;
 
-import org.apache.sdap.mudrod.ontology.Ontology;
-
 /**
  * Class contains static constant keys and values relating to Mudrod
  * configuration properties. Property values are read from https://github.com/mudrod/mudrod/blob/master/core/src/main/resources/config.xml;>config.xml
  */
 public interface MudrodConstants {
 
-  public static final String CLEANUP_TYPE_PREFIX = "Cleanup_type_prefix";
-
-  public static final String CLICK_STREAM_LINKAGE_TYPE = 
"clickStreamLinkageType";
+  public static final String CLEANUP_TYPE = "cleanupLog";
 
-  public static final String CLICK_STREAM_MATRIX_TYPE = 
"clickStreamMatrixType";
+  public static final String CLICK_STREAM_LINKAGE_TYPE = "clickStreamLinkage";
 
-  public static final String CLICKSTREAM_SVD_DIM = "clickstreamSVDDimension";
+  public static final String CLICK_STREAM_MATRIX_TYPE = "clickStreamMatrix";
 
 Review comment:
   Should be ```click.stream.matrix```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389965#comment-16389965
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172921807
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/discoveryengine/WeblogDiscoveryEngine.java
 ##
 @@ -140,8 +140,8 @@ public void preprocess() {
   public void logIngest() {
 LOG.info("Starting Web log ingest.");
 ArrayList inputList = (ArrayList) 
getFileList(props.getProperty(MudrodConstants.DATA_DIR));
-for (String anInputList : inputList) {
-  timeSuffix = anInputList;
+for (int i = 0; i < inputList.size(); i++) {
 
 Review comment:
   This is a regression and should be reverted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SDAP-35) Overhaul MUDROD configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SDAP-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390009#comment-16390009
 ] 

ASF GitHub Bot commented on SDAP-35:


lewismc commented on a change in pull request #6: SDAP-35 (completed the 
configuration change)
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/6#discussion_r172946369
 
 

 ##
 File path: 
core/src/main/java/org/apache/sdap/mudrod/ssearch/ranking/DataGenerator.java
 ##
 @@ -34,10 +33,10 @@
   private static boolean isMultFiles;
 
   private static String[] myHeader;
-  private static List myMasterList = new ArrayList<>();
+  private static List myMasterList = new 
ArrayList();
 
 Review comment:
   Just use diamond operator don't add type to right hand side of equation


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Overhaul MUDROD configuration
> -
>
> Key: SDAP-35
> URL: https://issues.apache.org/jira/browse/SDAP-35
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: mudrod
>Reporter: Lewis John McGibbney
>Priority: Major
>
> [~Yongyao] please augment the description here with your intended patch as 
> per https://github.com/aist-oceanworks/mudrod/pull/215
> Also, please name your branch and commit message after the issue you create 
> in JIRA. It makes things much easier as we try to improve the quality of our 
> source code review and development workflow. Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >