Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi Gihan, IMHO the recommended behaviour of the server needs to be as default configuration, and since our recommendation is configure purging to have smooth operation without getting accumulated with high amount of data in the data store. Therefore can we have some reasonable configuration for purging enabled with data retention period is 3 months/ 90 days, and run the purging task every week? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 2:46 PM, Gihan Anuruddha gi...@wso2.com wrote: Hi Sinthuja, Yes, by default data purging is disabled. It is a dev ops decision to enable or disable the purging task and come-up with suitable input like retention period, purge enabled tables etc. Regards, Gihan On Mon, Jun 29, 2015 at 2:38 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, When the purging disabled, if there is already registered purging task then it'll be deleted, and therefore the it's anyhow required to access the task service if it's enabled/disabled. But we can check the existence of task service, and do the analytics purging related operation if and only if the task service is registered, with this we can resolve the issue irrespective of above configuration. And we can log a warn message if the task service is not registered and purging task is enabled. @Gihan: I think by default the purging needs to be enabled for the continuous operation with RDBMS datasource, without too many data being accumulated in the datasource. Any reason for this to be disabled? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 2:07 PM, Nirmal Fernando nir...@wso2.com wrote: That worked Sinthuja! Thanks. However, is it possible to disable the Task Service initialization if the purging is disabled (which is the default behaviour)? analytics-data-purging purging-enablefalse/purging-enable purge-nodetrue/purge-node cron-expression0 0 0 * * ?/cron-expression purge-include-table-patterns table.*/table /purge-include-table-patterns data-retention-days365/data-retention-days /analytics-data-purging On Mon, Jun 29, 2015 at 1:57 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, Thanks for sharing the necessary details. It's due to the data purging configuration has been enabled in the analytics-conf.xml which uses the task internally. can you please try to comment the analytics purging configuration from the repository/conf/analytics/analytics-conf.xml and see? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando nir...@wso2.com wrote: Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi Nirmal, Thanks for sharing the necessary details. It's due to the data purging configuration has been enabled in the analytics-conf.xml which uses the task internally. can you please try to comment the analytics purging configuration from the repository/conf/analytics/analytics-conf.xml and see? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando nir...@wso2.com wrote: Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be required to be inside the external spark cluster. Can you please explain more on what is your usecase? And how you have configured the setup with DAS features? Thanks, Sinthuja. On Sunday, June 28, 2015, Nirmal Fernando nir...@wso2.com wrote: Hi DAS team, It appears that we have to think and implement DAS features so that they will run even in an non-OSGi environment like an external Spark scenario. We have some DAS features which are dependent on Task Service etc. and they are failing when we use the from within a Spark job which runs on an external Spark cluster. How can we solve this? -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *Sinthuja Rajendran* Associate Technical Lead WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955 ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be required to be inside the external spark cluster. Can you please explain more on what is your usecase? And how you have configured the setup with DAS features? Thanks, Sinthuja. On Sunday, June 28, 2015, Nirmal Fernando nir...@wso2.com wrote: Hi DAS team, It appears that we have to think and implement DAS features so that they will run even in an non-OSGi environment like an external Spark scenario. We have some DAS features which are dependent on Task Service etc. and they are failing when we use the from within a Spark job which runs on an external Spark cluster. How can we solve this? -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi Nirmal, When the purging disabled, if there is already registered purging task then it'll be deleted, and therefore the it's anyhow required to access the task service if it's enabled/disabled. But we can check the existence of task service, and do the analytics purging related operation if and only if the task service is registered, with this we can resolve the issue irrespective of above configuration. And we can log a warn message if the task service is not registered and purging task is enabled. @Gihan: I think by default the purging needs to be enabled for the continuous operation with RDBMS datasource, without too many data being accumulated in the datasource. Any reason for this to be disabled? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 2:07 PM, Nirmal Fernando nir...@wso2.com wrote: That worked Sinthuja! Thanks. However, is it possible to disable the Task Service initialization if the purging is disabled (which is the default behaviour)? analytics-data-purging purging-enablefalse/purging-enable purge-nodetrue/purge-node cron-expression0 0 0 * * ?/cron-expression purge-include-table-patterns table.*/table /purge-include-table-patterns data-retention-days365/data-retention-days /analytics-data-purging On Mon, Jun 29, 2015 at 1:57 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, Thanks for sharing the necessary details. It's due to the data purging configuration has been enabled in the analytics-conf.xml which uses the task internally. can you please try to comment the analytics purging configuration from the repository/conf/analytics/analytics-conf.xml and see? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando nir...@wso2.com wrote: Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be required to be inside the external spark cluster. Can you please explain more on what is your usecase? And how you have configured the setup with DAS features? Thanks, Sinthuja. On Sunday, June 28, 2015, Nirmal Fernando nir...@wso2.com wrote: Hi DAS team, It appears that we have to think and
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be required to be inside the external spark cluster. Can you please explain more on what is your usecase? And how you have configured the setup with DAS features? Thanks, Sinthuja. On Sunday, June 28, 2015, Nirmal Fernando nir...@wso2.com wrote: Hi DAS team, It appears that we have to think and implement DAS features so that they will run even in an non-OSGi environment like an external Spark scenario. We have some DAS features which are dependent on Task Service etc. and they are failing when we use the from within a Spark job which runs on an external Spark cluster. How can we solve this? -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
That worked Sinthuja! Thanks. However, is it possible to disable the Task Service initialization if the purging is disabled (which is the default behaviour)? analytics-data-purging purging-enablefalse/purging-enable purge-nodetrue/purge-node cron-expression0 0 0 * * ?/cron-expression purge-include-table-patterns table.*/table /purge-include-table-patterns data-retention-days365/data-retention-days /analytics-data-purging On Mon, Jun 29, 2015 at 1:57 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, Thanks for sharing the necessary details. It's due to the data purging configuration has been enabled in the analytics-conf.xml which uses the task internally. can you please try to comment the analytics purging configuration from the repository/conf/analytics/analytics-conf.xml and see? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando nir...@wso2.com wrote: Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be required to be inside the external spark cluster. Can you please explain more on what is your usecase? And how you have configured the setup with DAS features? Thanks, Sinthuja. On Sunday, June 28, 2015, Nirmal Fernando nir...@wso2.com wrote: Hi DAS team, It appears that we have to think and implement DAS features so that they will run even in an non-OSGi environment like an external Spark scenario. We have some DAS features which are dependent on Task Service etc. and they are failing when we use the from within a Spark job which runs on an external Spark cluster. How can we solve this? -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *Sinthuja Rajendran* Associate Technical Lead WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955 -- Thanks regards, Nirmal Associate Technical Lead -
Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment
Hi Sinthuja, Yes, by default data purging is disabled. It is a dev ops decision to enable or disable the purging task and come-up with suitable input like retention period, purge enabled tables etc. Regards, Gihan On Mon, Jun 29, 2015 at 2:38 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, When the purging disabled, if there is already registered purging task then it'll be deleted, and therefore the it's anyhow required to access the task service if it's enabled/disabled. But we can check the existence of task service, and do the analytics purging related operation if and only if the task service is registered, with this we can resolve the issue irrespective of above configuration. And we can log a warn message if the task service is not registered and purging task is enabled. @Gihan: I think by default the purging needs to be enabled for the continuous operation with RDBMS datasource, without too many data being accumulated in the datasource. Any reason for this to be disabled? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 2:07 PM, Nirmal Fernando nir...@wso2.com wrote: That worked Sinthuja! Thanks. However, is it possible to disable the Task Service initialization if the purging is disabled (which is the default behaviour)? analytics-data-purging purging-enablefalse/purging-enable purge-nodetrue/purge-node cron-expression0 0 0 * * ?/cron-expression purge-include-table-patterns table.*/table /purge-include-table-patterns data-retention-days365/data-retention-days /analytics-data-purging On Mon, Jun 29, 2015 at 1:57 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi Nirmal, Thanks for sharing the necessary details. It's due to the data purging configuration has been enabled in the analytics-conf.xml which uses the task internally. can you please try to comment the analytics purging configuration from the repository/conf/analytics/analytics-conf.xml and see? Thanks, Sinthuja. On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando nir...@wso2.com wrote: Hi Sinthuja, Thanks for the explanation. I think I should have used DAL instead of DAS. Yes, so what I talking here is about the DAL features. Exact error is [1] and reason for this is TaskService being null. Can you please check? [1] 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.4 KB, free 265.1 MB) 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.init(AnalyticsDataServiceImpl.java:149) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79) at org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67) at org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73) at org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi nirmal, DAS features such as scripts scheduling, purging,etc are used to submit the jobs (only spark queries) to external spark cluster, rather those DAS features jars doesn't need to exists within the external spark cluster instance. For example, if we consider spark script scheduled execution scenario which uses Task OSGI services, and the task triggering will be be occurred wihing DAS node (OSGI env), furthermore when the spark is configured externally the job will be handed over to the external cluster, and then results will be given back to DAS node. Therefore I don't think any of the DAS features jars other than DAL feature jars will be